Friday, June 08, 2018

Hope is my strategy

One of my favorite tech books is Google's Site Reliability Engineering book. They open with this tongue-in-cheek quote:

  "Hope is not a strategy" --  Traditional SRE saying

This attitude reminds me of the common idea in system administration that anything that can go wrong will go wrong. Murphy's law, "never trust a happy system administrator", and so on.

When your goal as a system admin is "100% service uptime", there's simply no way to meet that goal. You can only fail.

The authors of the SRE book look at the business costs and value to uptime metrics. They propose a different strategy, focusing on an "error budget" instead of a pure 100% uptime unachievable goal. Exposing this error budget allows the business decision makers to align with the inherent constraints engineers face when making speed vs safety decisions.

"Hope is not a strategy" means making our decisions data-driven rather than wishful-thinking-driven.

Operating like this means gathering a *lot* of data. Lots of monitoring, A/B testing, phased rollouts, and so on.

How much data is enough, though?

If you're Google or another big corporation, you can spend a lot of resources on monitoring and benchmarking. There's always more to measure, tweak and improve.

In my own life I can see the effects of wanting more and more data before making big personal decisions. It often means I delay beyond what's reasonable and miss out on opportunities because I'm risk-averse.

There's two sides to this problem:

- Bad: Wishful thinking, blind optimism, recklessness,.
- Also bad: Analysis paralysis, perfectionism, fear.

In 2018 I've faced some hard decisions in my personal life, where I have to make choices every week for how I'm going to live and what I'm going to do. These choices affect others around me as well.

At some point I have to stop gathering data. I don't have the resources to do the exhaustive research I daydream about for every decision. And even if I did, it's pure fantasy to think I can avoid pain and suffering in this life.

This prayer about serenity has really helped me this year:

  God grant me the serenity
  to accept the things I cannot change;
  courage to change the things I can;
  and wisdom to know the difference.

Of course I have to dig in and do the hard work - that's the "courage to change" part. But when decisions are murky, things are unclear, that is where serenity and wisdom come in. That is where hope is my strategy.

Where does your hope lie?

Tuesday, February 06, 2018

What is the place for one-on-one communication in open-source?

One of Red Hat's mantras is "develop in the open". There is an entire website, opensource.com, with tons of articles about this idea (this article in particular is great).

One aspect of "develop in the open" means keeping conversations as public as possible. Don't email or IM a developer directly; instead, email a development mailing list (possibly using the To: and CC: fields for your intended developer) or public IRC channel instead. It's hard to overstate the community benefits of this, and again opensource.com explains the benefits in more detail than you could ever want.

Sometimes people send me direct instant messages seeking information, instead of asking in a public channel. I think there is a fear of "spamming the channel" or fear of looking foolish. I can respect people's desire to avoid looking foolish. I've even done the same, and some wise people called me out on it. I suggest that you will not look nearly as foolish as you expected, though. Let's face it, if this topic was so obvious, you would have found some documentation on it already, right :) Maybe things are just hard to figure out! Maybe many other people would benefit from this probably-under-documented information!

In these conversations, I try to steer back into an IRC channel, replying "That's a great question. Would you be ok if we continued the conversation in #channel-that-relates-to-what-we-are-talking-about?" Then I tab over to that channel and say "so-and-so: we were just discussing <my rephrasing of their question>" to give some context to everyone else in the room. Then I answer the question so everyone can see it.

I've been thinking about a corollary to this concept this week: There is also a time for one-on-one IM conversations, and that is when you have to bring up a sensitive topic and you need to build some relational credibility.

Let's say I've noticed a mistake in some code or process. Let's also imagine I do not have positive relational credibility with the person responsible. Maybe this person is a different personality type than me, and we both drive each other nuts. Maybe it's been a pressure-cooker situation for any number of other reasons. If I bring up this person's mistake in a public IRC channel, we just go deeper on the negative spiral, and my behavior can look threatening. I've found it's more effective to bring up mistakes as privately as possible.

Of course we want to default to open and develop in the open. On the other hand, sometimes there is a greater good, where we a trade bit of openness for relational credibility. Once the relationship is there, maybe we'll get a chance to discuss future problems more openly without fear.

Wednesday, June 28, 2017

Forwarding gpg-agent to a container

I use Fedora on my main laptop, but sometimes I need to GPG-sign something in an Ubuntu environment.

I store my GPG key on my Yubikey and access the device with gpg-agent. Here are instructions for forwarding my gpg-agent connection into a Docker container.

This will only work on with a ubuntu:xenial image and newer, because Trusty has GPG 2.0, and this needs 2.1. Earlier versions of GPG 2 failed because they still need access to the data in secing.gpg. See https://www.gnupg.org/faq/whats-new-in-2.1.html#nosecring for more information.

On the host, bind-mount the gpg-agent socket when running the container:

docker run --volume /home/kdreyer/.gnupg/S.gpg-agent-extra:/gpg-agent --env GPG_AGENT_INFO=/gpg-agent:0:1 -ti ubuntu:xenial

Within the container: Xenial's gpg2 looks for the socket in ~/.gnupg, ignoring GPG_AGENT_INFO, so we have to link it in:

mkdir -p ~/.gnupg && chmod 700 ~/.gnupg
ln -s /gpg-agent ~/.gnupg/S.gpg-agent

Trust the kdreyer@redhat.com key:

gpg2 --keyserver keys.fedoraproject.org --recv 478A947F782096AC
echo -e "trust\n5\ny\n" | gpg2 --command-fd 0 --edit-key kdreyer@redhat.com

Test a signature operation:
echo hi | gpg2 -as -u kdreyer@redhat.com --use-agent 

Now we can use GPG with other tools, for example debsign:
debsign -p gpg2 tambo_0.4.0-0ubuntu0.16.04.1_source.changes

Note there's a bug in dput that it hardcodes the use of /usr/bin/gpg when verifying sigs, so you'll have to import your key again into the gpg1 key store:
gpg --keyserver keys.fedoraproject.org --recv 478A947F782096AC

And then you can upload to a Launchpad PPA:
dput ppa:kdreyer-redhat/ceph-medic tambo_0.4.0-0ubuntu0.16.04.1_source.changes

Wednesday, October 29, 2014

Sigal packaging and CentOS


My home server was running CentOS 6, and this was getting a bit long in the tooth:

  • The libwww-perl version that ships in CentOS 6 does not handle HTTPS certificates in a secure way. This was only fixed in LWP version 6. There's almost no chance of LWP getting rebased, since that module is part of Perl core, and it's so late in the RHEL 6 lifecycle.
  • The Python version (2.6) is so old that many Python apps no longer support it. The one I was particularly interested in was Sigal to generate my own photo gallery for my family.

I tried using the Python 3.3 software collections, and this worked well to get Sigal running in a Python 3.3 virtualenv.

For Perl, I didn't want to deal with SCLs, because my application has a long dependency chain, and I would need to rebuild a lot of SCL-style RPMs to get my app to work. I could just use the "cpan" tool (similar to virtualenv/pip), but I wanted to avoid the security and stability issues associated with using an essentially random snapshot in time of modules that I grabbed from upstream. I like the fact that Bugzilla is a central place to track CVEs, and I like the waiting period in epel-testing and the possibility for community collaboration there, etc.

The idea of using multiple SCLs and lots of non-packaged upstream modules was what pushed me to just bite the bullet and update to CentOS 7. CentOS base + CentOS extras + EPEL 7 already had all the deps for Sigal, except python-pilkit. I buckled down and learned just enough Python packaging techniques in order to package python-pilkit and python-sigal. And the best part is that the packages actually work on my new EL7 system (knock on wood).

sigal bundles some Javascript bits, and I'm not sure about the JS guidelines for EPEL. But otherwise I think the packages are close to being ready to submit to Fedora.

Wednesday, December 04, 2013

work

"What we want is not more little books about Christianity, but more little books by Christians on other subjects—with their Christianity latent."

- C.S. Lewis

Thursday, April 15, 2010

Typepad Antispam

I've just set up Typepad's open-source Akismet backend, also known as "Serotype". This software internally uses Perlbal for communicating HTTP, Gearman to delegate instructions, and dspam for content-based spam filtering. Other software requirements are MySQL and memcached.

Documentation is pretty scarce; a README file is basically all you get. However, if you're familiar with Perl you should be good to go. I put the pieces together in a CentOS 5 VM. Many of the required Perl modules were already in EPEL, but I did have to get some things directly from CPAN.

Here are my initial thoughts:
  • Thank you TypePad for making this open source, and releasing it to the world!
  • Most of Typepad's software is in Perl, and they are the creators of Perlbal/Gearman, so no surprise that this software is based on that as well. Since it uses Gearman, this Serotype server should be able to scale massively.
  • Once I installed all the required Perl modules, the software essentially worked "out of the box". I did need to adjust the Gearman client timeout to fifteen seconds. I traced this delay to the yuidd daemon. I'm not sure why it can take up to ten seconds to give me a UID.
  • The handling of API keys is very loose; the web service accepts API key by default. However, only keys that are "blessed" are able to actually train the database.
  • I wish there were an easier way to "prepopulate" the database with spam.
Web forms are the spammers' new battlefields. Good thing the Akismet API even exists.