Author: nigelb

  • Catching up with Infrastructure Debt

    If you run an infrastructure, there’s a good chance you have some debt tucked in your system somewhere. There’s also a good chance that you’re not getting enough time to fix those debts. There will most likely be a good reason why something is done in the way it is. This is just how things are in general. After I joined Gluster, I’ve worked with my fellow sysadmin to tackle our large infrastructure technical debt over the course of time. It goes like this:

    • We run a pretty old version of Gerrit on CentOS 5.
    • We run a pretty old version of Jenkins on CentOS 6.
    • We run CentOS 6 for all our regressions machines.
    • We run CentOS 6 for all our build machines.
    • We run NetBSD on Rackspace in a setup that is not easy to automate nor is it currently part of our automation.
    • We have a bunch of physical machines in a DC, but we haven’t had time to move our VMs over and use Rackspace as burstable capacity.

    That is in no way and exhaustive list. But we’ve managed to tackle 2.5 items from the list. Here’s what we did in order:

    • Upgraded Gerrit to the then latest version.
    • Setup Gerrit staging to test newer versions regularly for scheduling migration.
    • Created new CentOS 7 VMs on our hardware and moved the builds in there.
    • Moved Gerrit over to a new CentOS 7 host.
    • Wrote ansible scripts to manage most of Gerrit, but deployed currently only to staging.
    • Upgraded Jenkins to the latest LTS.
    • Moved Jenkins to a CentOS 7 host (Done last week, more details coming up!)

    If I look at it, it almost looks like I’ve failed. But again, like dealing with most infrastructure debt, you touch one thing and you realize it’s broken in someway and someone depended on that breakage. What I’ve done is I’ve had to pick and prioritize what things I would spend my time on. At the end of the day, I have to justify my time in terms of moving the project forward. Fixing the infrastructure debt for Gerrit was a great example. I could actually focus on it with everyone’s support. Fixing Jenkins was a priority since we wanted to use some of the newer features, again I had backing to do that. Moving things to our hardware is where things get tricky. There’s some financial goals we can hit if we make the move, but outside of that, we have no reason to move. But long-term, we want to me mostly in our hardware, since we spent money on it. This is, understandably going slow. There’s a subtle capacity difference and the noisy neighbor problem affects us quite strongly when we try to do anything in this regard.

  • Problems You Might Run Into Upgrading PostgreSQL on Fedora

    I was trying to test some code today and I realized I need a working PostgreSQL server. When I tried to start the server, it failed with this error.

    Aug 23 15:36:10 athena systemd[1]: Starting PostgreSQL database server... Aug 23 15:36:10 athena postgresql-check-db-dir[20713]: An old version of the database format was found. Aug 23 15:36:10 athena postgresql-check-db-dir[20713]: Use 'postgresql-setup --upgrade' to upgrade to version '9.6' Aug 23 15:36:10 athena postgresql-check-db-dir[20713]: See /usr/share/doc/postgresql/README.rpm-dist for more information. Aug 23 15:36:10 athena systemd[1]: postgresql.service: Control process exited, code=exited status=1 Aug 23 15:36:10 athena systemd[1]: Failed to start PostgreSQL database server. Aug 23 15:36:10 athena systemd[1]: postgresql.service: Unit entered failed state. Aug 23 15:36:10 athena systemd[1]: postgresql.service: Failed with result 'exit-code'.

    Ah, I upgraded to F26 recently and I suppose that came with a new version of PostgreSQL. I figured fixing this should be trivial. Well, not exactly. When I first ran the command, it asked me to install the postgresql-upgrade package. Once I did install it, the command threw a strange error.

    [root@athena pgsql]# postgresql-setup --upgrade  * Upgrading database. ERROR: The pidfile '/var/lib/pgsql/data-old/postmaster.pid' exists.  Verify that there is no postmaster        running the /var/lib/pgsql/data-old directory. ERROR: Upgrade failed.  * See /var/lib/pgsql/upgrade_postgresql.log for details.

    The /var/lib/pgsql/data-old/postmaster.pid file doesn’t even exist. It took me some time to realize that it’s actually looking at /var/lib/pgsql/data/postmaster.pid, which does exist. I think at some point, I had a running PostgreSQL server and I didn’t shutdown the computer cleanly. This lead to a stale PID file. Once I renamed the PID file, the upgrade command worked.

  • Clang Analyze for Gluster

    Deepshika recently worked on getting a clang analyze job for Gluster setup with Jenkins. This job worked on both our laptops, but not on our build machines that run CentOS. It appears that the problem was clang on CentOS is 3.4 vs 4.0 on Fedora 26. It fails because one of our dependencies need -fno-stack-protector, which wasn’t in clang until 3.8 or so. It’s been on my list of things to fix. I realized that the right way would be to get a newer version of clang on Fedora. I could have just compiled clang or build 4.0 packages but I didn’t want to end up having to maintain the package for our specific install. I decided to reduce complexity by doing a compilation inside a Fedora 6 chroot. This sounded like the least likely to add maintenance burden. When I looked for documentation on how to go about this, I couldn’t find much. The mock man page, however, is very well written and that’s all I needed. This is the script I used comments about each step.

    #!/bin/bash     # Create a new chroot     sudo mock -r fedora-26-x86_64 --init      # Install the build dependencies     sudo mock -r fedora-26-x86_64 --install langpacks-en glibc-langpack-en automake autoconf libtool flex bison openssl-devel libxml2-devel python-devel libaio-devel libibverbs-devel librdmacm-devel readline-devel lvm2-devel glib2-devel userspace-rcu-devel libcmocka-devel libacl-devel sqlite-devel fuse-devel redhat-rpm-config clang clang-analyzer git      # Copy the Gluster source code inside the chroot at /src     sudo mock -r fedora-26-x86_64 --copyin $WORKSPACE /src      # Execute commands in the chroot to build with clang     sudo mock -r fedora-26-x86_64 --chroot "cd /src && ./autogen.sh"     sudo mock -r fedora-26-x86_64 --chroot "cd /src && ./configure CC=clang --enable-gnfs --enable-debug"     sudo mock -r fedora-26-x86_64 --chroot "cd /src && scan-build -o /src/clangScanBuildReports -v -v --use-cc clang --use-analyzer=/usr/bin/clang make"      # Copy the output back into the working directory     sudo mock -r fedora-26-x86_64 --copyout /src/clangScanBuildReports $WORKSPACE/clangScanBuildReports      # Clean up the chroot     sudo mock -r fedora-26-x86_64 --clean
  • Crucial Conversations Training

    In the first week of July, I attended an internal training on Crucial Conversations. I’ve been eyeing that training ever since I started at Red Hat. It’s a skill that I’m poor at. I tend to avoid difficult conversations. When do I have them, I let emotions get the better of me or go the path that has the least amount of conflict. The training involves hands-on practice with methods and techniques taught in the book. I’d read about half the book before I went into the training, but the training was way more effective.

    One of the possible outcomes of a crucial conversation

    I learned two important lessons from this training. One, you can get into a conversation and very often it can turn into who’s winning and who’s losing. At this point, it’s very likely you’ve lost track of the original goal of the conversation. The second is to observe when a conversation is escalating due to aggression or silence. De-escalate the conversation first. Otherwise, you’ll have of two things happen. The other person will be angry and the conversation becomes a conflict. Or the person will agree to everything while utterly unhappy about it.

    Image Credit: Jule Falk on Flickr (license)

  • Open Source Summit Tokyo 2017

    I spoke at the Linux Foundation’s Open Source Summit in Tokyo, Japan. The event was co-located with the Automotive Linux Summit. This post is a (very delayed) report of the event as well as the trip to Japan. Amye and I did an updated version of the Linuxcon Berlin talk to a Tokyo audience. It had some audience, but it’s time we retired this talk.

    I managed to wake up early enough to run one morning

    Flying to Tokyo was pleasant. Bombay has a direct flight to Narita with ANA. I got one of the emergency row seats, so I didn’t have much of a view. Narita airport had the most efficient immigration and customs I’ve seen. I was out of the airport in 30 mins. I planned it out so I took the Keisei Access Express to Shimbashi and then the Yurikamome line to the hotel. Despite not knowing any Japanese, I could find my way. I was quite exhausted when I got to the hotel and I wasn’t looking forward to the 3 pm check-in. Luckily, the room was already ready. I considered ordering room service, but I went exploring at the mall nearby. The place had English menus and very nice food.

    Lovely lunch during one of the conference days

    I had no connectivity trouble despite the lack of a local phone connection. Most metro stations and malls have free internet. I was using Telegram, Twitter, and Slack all the time. The Hilton Wi-Fi was pretty strong too.

    Dinner for Speakers, Sponsors and Press

    I spent the rest of the day catching up on sleep. That night I made the mistake of going to bed with the curtains open. I jumped awake next morning thinking I’d slept in. It turns out that the sunrise in Tokyo is around 0430. So what felt like 0900 to me was actually 0500. It took a while to go back to sleep. I spent the morning finishing up our slides. Then, I went out exploring a bit of Tokyo and stationery shopping at Itoya. I wanted to do more touristy things, but carrying around bags of stationery wasn’t fun. By the time I got back, dropped off the stuff at the hotel, I was too tired to go back out exploring. I roamed around Odaiba instead and went hunting for the conference venue. I’m glad I did because I got lost. I found the venue, picked up my badge and met fellow Red Hatters who were setting up the booth.

    One evening, I walked up to the bridge and went half way across it. It has very pretty views

    Our talk was on the first day of the conference, which was a good thing. I was stress-free after the first half of the day. I met Arun at the event which was a pleasant surprise. The rest of the conference provided an opportunity for Bex and I talk to about documentation. We’re working on adopting the same documentation tool.

    You can't beat this 'demo'

    The Automotive Linux Summit had fantastic demos. I felt like the booths for the Open Source Summit aren’t as fantastic.. The ALS folks had displays and car simulators that should show how their displays work. I helped with the booth duty for a couple of shifts, but, again, we didn’t have enough interesting things to demo. While walking around those booths, I had idea for a Gluster demo. If all works out, I plan on putting that up for the next Open Source Summit in Prague.

    Out of the talks that I attended, the ones that stood out are:

    Image credit: Linux Foundation Speaker Party Lexus Demo (license)