10 Years in the Industry

In May 2020, I went past a quiet miletone. I completed 10 years in the IT industry. I remember interviewing for my first job. In retrospect, what they wanted was a (Techical) Program Manager. During my interview, they realised that I can code and they decided to see how I would do given a programming task. I did reasonably well, based off my experience in the Open Source community. I got assigned more work and tasks that required learning entirely new things. I’d like to think I did reasonably well. I can’t remember much of that job other than building out custom CMSes from scratch. Since then, I’ve worked as a sysadmin, a web developer, conference organiser, a tree sheriff, a CI architect, and as a Corporate Operations Engineer.

I’ve really found that what I enjoy doing is the sort of work that is a blend of Operations and Software Engineering. Working on purely Software Engineering projects tire me out – when people take no notice of the operational implications of their work or how the deployment of the software works. Too much operations make me frustrated that we’re putting out fires but not spending enough time fixing long-term issues or evaulating how to get out of the hole we’ve dug for ourselves.

When I started my career, I knew mostly PHP and Python. Over the years, I spent a lot more time strengthening my Python experience and picking up some Javascript along the way. Not a lot of it, mind you, but just enough to be able to read code and understand what it does. Recently, I’ve used a lot more PowerShell and Golang. I’m teaching myself modern C++, it’s a very different language from the one I learned in school. I’m surprised that I can understand Java though I wouldn’t be able to write anything from scratch in it. I’ve now maintained in some shape or form – Ubuntu, RHEL, CentOS, Fedora, Debian, NetBSD, FreeBSD, and most recently Windows.

I’ve worked in offices and from home. Looking back, I think I’ve spent more time at home than I have spent at an office. In the current crisis, I find myself more prepared thanks to that reserve of experience. It took a little bit of time to find my rhythm, but now I’m productive but not overworking. I wake up at 5 am, and start my day sipping tea and planning out my schedule based on my todo list. Once I’m sure I’ve aligned them correctly, I go for a run or try to do some strength workouts at home. The latter is challenging. I find it easier to lace up and go outside for a run. Then, it’s shower followed by breakfast. I glance at my emails to estimate how much time I need to spend on dealing with them. The advantage of US West coast emailing me is that replies can usually wait. When I start my day, I do code reviews and then my own coding. I prefer to do my reviews first, when I’m really fresh, and then get into writing my own code. I protect my morning hours furiously, because it’s my most productive time. Around 12, I break for long-ish lunch break. After lunch, I get to my emails and other administrivia. Then, I have another coding block or time for another big chunk of work. Sometimes, it’s chasing down a particularly nasty bug. From around 3 pm, the focus is beginning to fade and from here on out, most of my meetings happen. I tend to make a plan for the next right about now. I have an alarm that rings at 6:30 pm. That’s the deadline where I will stop work. I may look back at it later, but that’s after dinner and some time with the guitar.

New Country and New Job

I thought when I moved my website to WordPress, I’d blog more. If anything, I’ve blogged less. But hey, I have a good reason. About 4 months ago, I moved to Dublin, Ireland. This was to start my job at The Search Engine company. Today I finish 4 months in Dublin. It feels like much more. It’s been an interesting and stressful few months. All the research we did for the months before we moved helped out. It came in especially handy while house hunting.

Ever since I got here, I feel like I’ve been trying to get things done to get my life stable. The first task was finishing up all my paperwork and initial setup tasks. So, registering with immigration, getting my PPS number, and getting setup with a phone. Once I had the essentials, the next step was finding an apartment. The Dublin rental scene is particularly stressful but thankfully, I had relocation assistance. We’re often better at research and we actually found the apartment ourselves. We used the relocation folks to negotiate the lease and help with the initial tasks of moving in.

We now live in a quiet North Dublin suburb. It’s so quiet that the loudest noise is often the sound of our ears ringing from the sheer silence. The beach is a short walk away, but we can’t see it from the apartment. My morning commute is a comfortable 45 mins by bus and train or about 25 mins by bicycle. On a good weather day, I cycle in to work. Living here, I realise how people run in the afternoon. The weather in the afternoon is actually often pleasant and not the kind that tries to kill you.

The job itself is interesting. I’m in a role that I have enjoyed a lot in the past. It’s a mix of doing ops work as a sysadmin and a bit of doing automation. The idea is to automate as much away as possible from our day to day ops work. My team has 10 of us in two different time zones who manage to do way more work than it should be possible for us. The biggest different from my previous jobs is that I work with Windows more often. It’s fun to learn new things. Everything is different and sometimes, things don’t work because Windows.

Hanging up my red fedora

Feb 12th was my last day at Red Hat. I sent a very similar version of this blog post as a note to my colleagues as well. It’s been a fun nearly 3 years working at Red Hat and wearing the Red Hat fedora[1]. I’ve had a wonderful time working for Red Hat both from the New Delhi and the Mumbai offices. I’ve enjoyed the odd visits to Bangalore over the last few years. I’m very grateful for my time here and it’s time to move onto bigger adventures elsewhere.

In 2015, I did not think that a chance conversation with Sankarshan at FUDCon Pune would lead to interviewing at Red Hat 6 months down the line and subsequently working here. Over my 11-year career, this is the first time I’ve had a fantastic manager who has been both a friend and a mentor. Remote work is challenging in general. I could not have pulled off dealing with the various challenges if it weren’t for Sankarshan’s help and encouragement. I’m grateful for the office mangers in Delhi and Mumbai for giving me a second home for when I wanted to meet people[2].

I’m leaving here with great memories, friendships, and great lessons learned. I’ve had the opportunity to help stabilise Gluster infrastructure. When I look back to how things looked, I’m grateful that it’s a sea of change. I could not have been successful at Red Hat without the help of folks in my team in Gluster and in other parts of Red Hat. In particular, I’m grateful to Sankarshan, Alfredo, Amye, Atin, Jeff Darcy, John Strunk, Nithya, and Shyam.

The fondest and funniest memory of my time at Red Hat is going to be about that time when Jenkins started speaking French. If you don’t remember or you don’t know about this, you should read the post-mortem[3] for that failure. I wrote a blog post about it last week as well.

I will no longer be a Red Hat employee, but I’m still going to be a Gluster community member. I’ve been on Freenode for than 10 years and I suspect I’ll continue to be there for many years to come. If you want to stay in touch, IRC is going to be the best way to reach me and have me respond.

[1]: I didn’t actually get my Red Hat Fedora, but let’s not get into semantics 😛
[2]: Or just sit in air conditioning.
[3]: Unless you want to talk to me about an infra issue, in which case, file a bug 😀

The Funniest Incident Postmortem

Recently, I had a chance to think about an outage that I debugged and fixed a few years ago that involves Jenkins and systemd (or in this case lack thereof!).

Generally, if you want to run a task at the end of every Jenkins job whether the job has passed or failed, you have two options. You could use trap and write a clean up function. I would highly recommend that you use trap. Or you could be like me and write a post-build publisher that would run a script if it finds the line “Building remotely” in the console output. It’s quite hacky, but since the first line of every job is “Building remotely“, it works. I used to depend on this for clean up on a couple of Jenkins jobs a while ago and later removed it because of this infamous outage.

The Problem

Let me preface this by stating and this happened due to a combination of factors that I don’t expect to repeat. We were using an old version of Jenkins on an old version of CentOS. This means, it was still using init scripts and not systemd. The init file is just a shell script.

If you didn’t already know, SSH tends to forward your LANG information to the environment you connect to and force that environment to be similar to your current locale. I use en_US, but my French sysadmin colleague uses fr_FR locale. Which mean if I connect to a server, I would have English errors messages and if he did, he would have French ones.

When my colleague restarted Jenkins on that fateful day, his environment leaked into the Jenkins init script possibly due to a bug. Voila! Jenkins now speaks French. This meant my clean up didn’t work anymore. Instead of “Building Remotely” we had “Construction à distance“. Obviously, all the jobs failed.

The Solution

I had to stop and start Jenkins again so it spoke English. We made plans to upgrade both the OS and Jenkins so we didn’t run into this specific bug again. Aside from making sure that Jenkins didn’t accidentally speak French again, we also removed the clean up script.

In this case, the the job was creating rpms using mock. We would run mock with sudo and that meant the rpms were owned by the root user and the jenkins user could not delete the rpms. My solution back then was to use ACLs to give the jenkins user write access to files in the Jenkins workspace folder irrespective of the real owner. You can read my original postmortem on the gluster-infra mailing list archives.

We are currently in the process of changing hosting providers. The fix with ACLs always seemed hacky to me and I wanted to take this chance to remove the ACLs entirely. I’ve just added the jenkins user to the mock group and we build rpms without using sudo. That solves all the problems much more cleanly.

But hey, it brings me great joy to say we had a bug where Jenkins spoke French and thus caused a fun day of debugging and fixing.

Getting rpcbind to work without IPv6

This advice is going to be useful to a small subset of folks. But it’s useful nonetheless. With us being nearly exhausted of IPv4 addresses, we should probably not be disabling IPv6, but there are some rare situations where some tests depend on IPV4 only. The Glusterfs regression test framework makes a lot of assumptions. One of them is that the network is always an IPV4 network. Gluster does work with IPv6. However our tests and related regular expressions haven’t yet moved to IPv6.

We’re in the process of moving cloud providers. Every time we move, we run into some trouble with server setup. There’s some setup that’s different in base images across the spectrum. Every time, we run into a trouble with rpcbind refusing to start. Every time, we think we have it figured out and automated it away. This time we found a new way it could break!

Generally, this is how you disable IPV6:

  • Add IPV6INIT=noline in /etc/sysconfig/network-scripts/ifcfg-eth0
  • Add NETWORKING_IPV6=noline in /etc/sysconfig/network
  • Run sysctl net.ipv6.conf.all.disable_ipv6=1
  • Run sysctl net.ipv6.conf.default.disable_ipv6=1

After you disable IPv6, rpcbind will fail with the following error:

rpcbind.socket failed to listen on sockets: Address family not supported by protocol

To fix the error you need to reboot with dracut -v -f and reboot. This process is described on the Red Hat Knowledgebase and has worked for us in the past.

In the new provider, we ran into the same error despite doing that. What we discovered is that we need also remove all /etc/hosts entry that have ::1 in them. Because, if a reverse DNS entry converts to an IPV6 entry, that causes rpcbind to try to make IPv6 connections and the error looks just as though you did not run the dracut -v -f command.