A Funny Bug

Yesterday, we had a 6-hour long tree-closure. From 11:48 am PST to 16:43 pm PST. Philor closed the trees when h…

Yesterday, we had a 6-hour long tree-closure. From 11:48 am PST to 16:43 pm PST. Philor closed the trees when he noticed the issue. By the time I noticed it, it was much later and luckily, I was in a room with the A-team and Releng. I walked over to catlee and nthomas. They both took a look and we were trying to figure out the problem. Wes and Ryan were helping me later and we just called rail and [mrrrgn][morgan] over to help us sort it out. This is much better than IRC!

Couple of confounding things at that point – It was happening across trees, and it didn’t feel like a code issue. However, Releng are all here and haven’t landed any changes. Morgan and rail identified the problem pretty soon. We were deleting temp files on AWS machines as well instead of only on SCL3 machines. Among the temp files we deleted were pulseaudio files. Deleting these files will break pulseaudio.

Now the funny thing about this bug. We ran into it only now because we had so few commits. Let me say that again, we triggered a bug in the automation because we had so few commits. That is hilarious, probably in a gallows humor kind of way.

Weird IE8 error. Nginx to the rescue!

As a server side developer, I don’t run into IE-specific errors very often. Last month, I ran into a very specific error, which is spectacular by itself. IE8 does not like downloads with cache control headers. The client has plenty of IE8 users and preferred we serve over HTTP for IE8 so that the site worked for sure.

Nginx has a very handy module called ngx_http_browser_module to help! All that I needed to do was less than 10 lines of Nginx config.

location / {     # every browser is to be considered modern     modern_browser unlisted;     # these particular browsers are ancient     ancient_browser "MSIE 6.0" "MSIE 7.0" "MSIE 8.0";     # redirect to HTTP if ancient     if ($ancient_browser) {         return 301 http://$server_name$request_uri;     }     # handle requests that are not redirected     proxy_pass http://127.0.0.1:8080;     proxy_set_header X-Forwarded-For $remote_addr;     proxy_set_header Host $host;     proxy_set_header X-Forwarded-Proto $scheme; } 
It's Magic GIF

Yet another day I’m surprised by Nginx 🙂

Arrrgh! Tracebacks and Exceptions

My colleague asked me to take a look at a logging issue on a server last week. He noticed that the error logs had way too little information about exceptions. In this particular instance, we had switched to Nginx + gunicorn instead of our…

My colleague asked me to take a look at a logging issue on a server last week. He noticed that the error logs had way too little information about exceptions. In this particular instance, we had switched to Nginx + gunicorn instead of our usual Nginx + Apache + mod_wsgi (yeah, we’re weird). I took a quick look this morning and everything looked exactly like they should. I’ve read up more gunicorn docs today than I’ve ever done, I think.

Eventually, I asked my colleague Tryggvi for help. I needed a third person to tell me if I was making an obvious mistake. He asked me if I tried running gunicorn without supervisor, which I hadn’t. I tried that locally first, and it worked! I was all set to blame supervisor for my woes and tried it on production. Nope. No luck. As any good sysadmin would do, I checked if the versions matched and they did. CKAN itself has it’s dependencies frozen, this lead to more confusion in my brain. It didn’t make sense.

I started looking at the Exception in more detail, there was a note about email not working and the actual traceback. Well, since I didn’t actually have a mail server on my local machine, I commented those configs out, and now I just had the right Traceback. A few minutes later, it dawned on me. It’s a Pylons “feature”. The full traceback is printed to stdout if and only if there’s no email handling. Our default configs have an email configured and our servers have postfix installed on them and all the errors go to an email alias that’s way too noisy to be useful (Sentry. Soon). I went and commented out the relevant bits of configuration and voilà, it works!

Palm Face

Image source: Unknown, but provided by Tryggvi 🙂

The story of hgstats

Lately, we’ve had problems with hg at Mozilla. The Developer Services Team added a bunch of instrumentation to the hg webheads to help us track what is going wrong and when, to give us somewhat an early indicator…

tl;dr: I built a thing to see public graphs of hg.mozilla.org called hgstats.

Lately, we’ve had problems with Mercurial at Mozilla. The Developer Services Team added a bunch of instrumentation to the hg webheads to help us track what is going wrong and when, to give us somewhat an early indicator of when things get shot to hell. All of these are on the Mozilla Graphite instance which are behind employee-only LDAP. However, an interesting quirk is that the image rendering is actually available without authentication. As a community Sheriff, I’ve been keeping close watch on hg throughout my shift with images that releng folks or hwine gave me. This gave an indicator of when to close trees so that we don’t end up having everything turn red. On Thursday evening, I was watching the conversation in #vcs on irc.mozilla.org, when bkero mentioned he’d made a dashboard in graphite. It suddenly dawned on me that I could just embed those images onto a page and quickly have a public dashboard!

Armed with a bunch of images from Ben, I created a github pages repo with a lovely theme that’s available by default. I embedded the images onto a static HTML page and suddenly, we had a minimal dashboard. It wouldn’t auto-refresh or let you alter the duration of the graph, but hey, now we had one place for things! This first step took about 15 minutes.

There were two features I had in my mind as must-haves: a) the page must let me change the hours of the graphs (i.e. last 2 hours, last 4 hours, last 8 hours, etc), and b) it should auto-refresh. I’ve looked at backbone several times in the past and I figured this was a good time as any to get cracking on building a backbone.js app.

I started slowly, the first step was, get everything I have right now, rendered with backbone. I spent a lot of frustrating hours trying to get it to work, but couldn’t because of silly mistakes. I haven’t been coding in JS much and it shows 🙂 I think I stayed up until 2 am trying to diagnose it, but I couldn’t. When I woke up in the morning, I spotted the trouble immediately and it was a tiny typo. Instead of <{13371f13f0bf161e7595c2ac5df92e005bed3de1d132ef646d0a44f3a1a9ee62}=, I typed <{13371f13f0bf161e7595c2ac5df92e005bed3de1d132ef646d0a44f3a1a9ee62}$. After that first step, I got the router bit working and I had an app that could dynamically change the range of hours in the graph! I’d met my first goal!

I talked to mdoglio who took a quick look at the code and thought models might be a good idea if I’m dealing with data. I refactored the code again to use models, which cleaned it up quite well! Overnight, I had a pull request from hwine to add another graph as well, which I also made more dynamic.

The hardest bit was getting auto-refresh working. I couldn’t figure out an easy way to solve the problem. Eventually, I ended up with setTimer, but the full credit for the right incandation goes to bwinton.

High Five!

Working with backbone has been great, but I wish the documentation did more than just tell me what each function did. Python’s documentation often gives you more than function’s description, it tells you how you would use it practically. Of course, there quite a few resources that already fill this gap. I found backbonetutorials.com pretty useful. I got most of the basic idea of backbone from the site.

I also submitted it to Webdev Beer and Tell (my first submission!). Mike kindly presented it for me (no, he’s not the real nigelb!) and you can watch the video on Air Mozilla if you have some free time 🙂 I would totally recommend watching the whole video, but if you don’t have a lot of time, skip to 6:37.

This is the first time I’ve built a single-page app, so I’d love feedback (extra points if you can do a code review). The code is on GitHub.

OKFestival Fringe Events

The writeup of the OKFestival is very incomplete, because I haven’t mentioned the fringe events! I attended two fringe events and they both were very good.

The writeup of the OKFestival is very incomplete, because I haven’t mentioned the fringe events! I attended two fringe events and they both were very good.

First, I attended CKANCon right before OKFestival. It was informal and co-located with CSVConf. My best takeaway has been talking to people from the wider community around CKAN. I often feel blind-sided because we don’t have a good view of CKAN. I want to know how a user of a portal built on CKAN feels about the UX. After all, the actual users of open data portals are citizens who get data that they can do awesome things with. I had a good conversation with folks from DKAN about their work and I’ve been thinking about how we can make that better.

I finally met Max! (And I was disappointed he didn’t have a meatspace sticker 😛

The other event I attended was Write the Docs. Ali and Florian came to Berlin to attend the event. It was total surprise running into them at the Mozilla Berlin office. The discussions at the event were spectacular. The talks by by Paul Adams and Jessica Rose were great and a huge learning experience. I missed parts of oncletom’s talk, but the bit I did catch sounded very different to my normal view of documentation.

We had a few discussions around localization and QA of docs which were pretty eye opening. At one of the sessions, Paul, Ali, Florian and I discussed rules of documentation, which turned out pretty good! It was an exercise in patience narrowing them down!

I was nearly exhausted and unable to think clearly by the time Write the Docs started, but managed to face through it! Huge thanks to (among others ) Mikey and Kristof for organizing the event!