Author: nigelb

  • Gluster Summit 2017

    Right after Open Source Europe, we had Gluster Summit. It was a 2-day event with talks and BoFs. I had two key things to do at the Gluster Summit. One was build out the minnowboard setup to demo Tendrl. This didn’t work out. I had volunteered to help with the video work as well. According to my plans. The setup for minnowboards would take about 1h and then I’d be free to help with camera work. I had a talk scheduled for the second day of the event. I’d have expected one of these to two wrong. I didn’t expect all to go wrong 🙂

    The venue had a balcony, which made for great photos

    On the first day, Amar and I arrived early and did the camera setup. The venue staff were helpful. They gave us a line out from their audio setup for the camera. Our original plan was that speakers would have a lapel mic for the camera. That was prone to errors from speakers and also would need us to check batteries every few hours. When we first tried to work with the line in, we had interference. The camera power supply wasn’t grounded (there wasn’t even a ground out. The venue staff switched out the boxes they used for line out and it worked like a charm after that.

    We did not have a good start for the demo. Jim had pre-setup the networking on the boards from home and brought them to Prague. But whatever we did, we couldn’t connect to it’s network the night before the event. That was the day we kept free to do this. That night we gave up, because we needed a monitor, an HDMI cable, and a keyboard to debug it. At the venue, we borrowed a keyboard and hooked up the board to the monitor. There was no user for dnsmasq, so it wasn’t assigning out IPs and that’s why the networking didn’t work. Once we got past that point, it was about getting the network to work with my laptop. That took a while. We decided to go with a server in the cloud as the Tendrl server. By evening, we got the playbook run and get everything installed and configured. But I’d made a mistake. I used IPs instead of FQDNs, so the dashboard wouldn’t work. This meant re-installing the whole setup. That’s the point where I gave up on it.

    We even took the group picture from the balcony

    My original content for my talk was to look at our releases. Especially to list out what we committed to at the start of the release and what we finished with. There is definitely a gap. This is common for software projects and how people estimate work. This topic was more or less covered on the first day. I instead focused on how we fail. How we fail our users, developers, and community. I followed the theme of my original talk a bit, pointing out that we can small large problems in smaller chunks.

    We’re running a marathon, not a sprint.

  • Open Source Summit Europe 2017

    In September, I attended Open Source Summit Europe. I realized this week that I haven’t written about it yet. The conference is massive. This edition had 2187 attendees from 65 countries according to the post-event email. The great part of the big event is the chance to meet and socialize with people I don’t otherwise get to meet. The side effect is that you often have more than one session you want to attend. The other end of that is you’re about wiped out at the end of every day. Even before the social events for the day start.

    I did not get to attend all the talks I wanted to. I did skip a few sessions and spend some time recovering. Here are the talks that I liked.

    The keynote by Reuben Paul about security was quite fun. It’s rare you see someone show a live demo on stage and have it work well. His talk is a reminder to developers that obfuscation is not security. When you build something, think about it from the point of view of an attacker. In the world of IoT, the question is not “if” you’re compromised, it’s “when” you’re compromised.

    My colleague Robert Kratky talked about modular documentation. This is one of my favorite talks from the event. I’ve taken copious notes about it. The summary of the talk is to build modular use-case oriented documentation. “How to make an omlette” rather than document that talks about knives, chopping, onions, and eggs. While references need to exist, documentation needs to solve users problems.

    I don't remember what I'm passionately arguing about

    The first evening, there was a CentOS-Fedora-EPEL BoF hosted by Jim, Peter and Brian. This session teased out problems in the ecosystem and how it some of our solutions aren not perfect. For instance, a package in EPEL cannot override a system package, because it’s meant for RHEL as well. A package in CentOS SIG can override system packages. This is the recommended route for non-EPEL packages into CentOS.

    On the second day, I attended a half day session about CHAOSS project. It was a good introduction to metrics and what other communities do. Amar and I attended the session together. We’ve come back with a long-list of things we want to do to track the health of our community. The session was quite long, technical and educational. If I had spare cycles to contribute, that’s where I’d be spending my time. This session also gave us a chance to talk to our friends from Bitergia.

    My talk was on the last day of the conference. I like my talk being on the first day because it lets me be stress free for the rest of the event. I talked about testing products where there’s a wide range of configuration options. The premise of my talk is that “Unless you can prove it with a test, a feature is assumed to not work.” My approach is that the product needs focus on solving use-cases rather than features. We need to view every feature with the lens of what problem it helps solve. This will let us narrow down configurations which work best for use-cases. This reduces the permutations of configurations which need to be tested.

    Image credit: Linux Foundation OSS EU (license)

  • Airtel Delhi Half Marathon 2017

    After 2 years, I ran ADHM again this year. This is the one time I nearly dropped because of alarming levels of pollution the week before the event. If Procam does not change the dates of the event in 2018, I don’t see myself running this event again. We got lucky this time with the air clearing up the week leading up to the event.

    Delhi has been the best city for my running. This is where I went from C25K to running my first marathon over the course of 2 years. When we moved away, I almost stopped running completely. Bombay was not conducive to running for me. Running in a park with a radius of 200m gets old fast. It takes 25 loops to get to a 5K and the monsoons are a constant threat. I haven’t run more than 400 km this year. I went into the race with little to no training. I was hoping that this would kickstart my training cycle for the next big race.

    I met Bulbul at the start line and we started about together. I was in C group given that my last ADHM timing was 2h 11m. Last time I prepped for cold weather forgetting that it warms over the course of the race. This time, I was wearing my trusty shorts and T-shirt, which I wore in Brno and Prague in sub 10C weather. Given my lack of training, I started slow and let people pass me over the course of the race.

    I picked up a timing chart for finishing in 3h during the race expo. That was near perfect for me. I started the race with a mask as well. The mask was uncomfortable because I hadn’t trained with it. At around the 7 km mark I gave up on the mask. I banked a few minutes because of the speed in the first 5 km or so. I kept going faster than the plan, but slower than a 2h 45m finish. Until about the halfway mark, I was running with occasional walking breaks to drink water. After the 10.5K mark, I started maintaining the goal pace that was on my chart, which means I slowed down. At around the 16K mark, my back was giving me trouble, so I decided to walk for a bit and then run. I walked most of the way back after that picking up pace only after turning into the stadium. Procam had the aid stations well staffed and there was water and ORSIL till the very end.

    As I expected, this has kickstarted my training. It’s a little challenging to go out in the winter to run. And it’s not very motivating when my performance is below where I left it off. But I’ve done this before. I’ve gone from near zero training to fast half marathons. It’s only a matter of consistent training for the next year or so before I break my half marathon best again.

  • Remote Triggering a Jenkins Job

    All I wanted to do was trigger a job on another Jenkins instance. Here’s all the things I tried.

    • Tried out a plugin. This plugin does not work
    • Forked the plugin and applied some of the patches that have been contributed.
    • I wrote Python code to do it.
    • I wanted to get a “Build Cause” working and since that didn’t work on the first few tries, I added it as a parameter.
    Pretty much what I kept hitting

    It turns out that what I thought was working wasn’t actually working. I wasn’t passing parameters to my remote job. It was using the defaults. The fix for this problem is the most hilarious that I’ve seen. Turns out if you use the crumbs API and Jenkins auth, you don’t need the token. This was a bit of a lovely discovery after all this pain.

    Now I need to figure out how to follow the Jenkins job, i.e. get the console output on remote Jenkins in real time. I found a python script that does exactly that. I tested it and it works.

  • Upgrading the Gluster Jenkins Server

    I’ve been wanting to work on upgrading build.gluster.org setup for ages. There’s a lot about that setup that isn’t ideal in how people use Jenkins anymore.

    We used the unix user accounts for access to Jenkins. This means Jenkins needs to read /etc/passwd and everyone has SSH access via passwords by default. Very often, the username wasn’t tied to an actual email address. I had to guess the account owner based on their usernames elsewhere. This was also open to brute force attacks. The only way to change passwords was to login to the server and run passwd command. We fixed this problem a few months ago by switching our auth to Github. Now access control is a Github group which gives you more permissions. Logging in will not give you any more permissions than not logging in.

    Our todo list during the Jenkins upgrade

    Jenkins community now recommends not running jobs on the master node at all. But our old setup depended on certain jobs always running on master. One by one, I’ve eliminated them so that they can now run on any node agent. The last job left is our release job. We make the tar from every release available on an FTP-like server. In our old setup, the this server and Jenkins were the same machine. The job ran on master and depended on them both being the same machine. We decided to split up the systems so we could take down Jenkins without any issue. We intend to fix this with an SCP command at the end of the release job to copy artifacts to the FTP-like server.

    One of the Red Hat buildings in Brno

    Now, we have a Jenkins setup that I’m happy with. At this point, we’ve fixed a vast majority of the annoying CI-related infra issues. In a few years, we’ll rip them all out and re-do them. For now, spending a week with my colleague in Brno working on an Infra sprint has been well worth our time and energy.