Blog

  • The Making of the Equivalent Salary Calculator

    I spoke to Rufus Pollock from OKFN a week ago and he encouraged me to try one of the Get the Data challenges. I decided to build the equivalent salary converter since I’d always been curious to have some way to equivalent salary based on cost of living.

    What’s the data I need?

    The first challenge was to understand what data I needed to solve this problem. I spent a few hours reading the Wikipedia page for Purchasing Power Parity . As someone who hated originally Economics, it took me a while to make some sense of all this. I further branched to reading about the Big Mac Index and Geary-Khamis dollar, among others.

    Finding the Data!

    Since the original challenge itself is called “Get the Data Challenge”, I’ll honestly admit that this was perhaps the most challenging of all the tasks in building this application (XML parsing finished a close second :P). The Wikipedia article on Purchasing Power Parity has links to several data sources, which was a cause for great joy until I discovered all of them lead to 404s. I went through some parts of the World Bank data and looked at the UN Data website. I was stuck at not knowing what exactly I was looking for.

    At some point, as I was going through another part of the World Bank data site, I saw something about indicators and decided to poke at it. At one point, I even wondered if I should give up and pick some of the other interesting data available like Physicians per 1000. Finally, I stumbled upon the PPP conversion factor data. I didn’t realize this was the data I needed until a little while later. For someone like me, who’s unfamiliar with the words involved, it’s not easy even recognizing that I’ve found what I was looking for. I exported the data from the World Bank website and decided to have a go at parsing it.

    Parsing the data a.k.a. XML Hell

    76756 lines of XML?! It send shivers down my spine when I first opened the file. I started off with the lxml module to parse the data. It took me several hours of reading the documentation, and trial and error to get a hang of the API. I raced to write down a quick python script to take all the data from the XML and give me a CSV with data that I wanted. The original XML had much more data than I wanted. The script and CSV output of the script are both on GitHub if you’d like to look. I suspect if you’d like to play with another World Bank dataset, this script might give you a starting point. In retrospect, importing the data directly into Recline DataHub might have been a good idea.

    Writing the App

    I strongly believe in MVP when I build an application. My MVP when starting was getting a form working which would let me select a country of origin and a target country and it would calculate the salary for that country. This is what the app can currently do. The original challenge involved showing a map and clicking on a point in the map would show the equivalent salary for that country. I’m very glad that I decided on an easier to achieve MVP or else I’d have nothing to show right now. The UI is built on Zurb Foundation and the minimal backend is written in Flask. The first iteration of the app didn’t use client-side Javascript to do the calculation, the form was POSTed to the server for every calculation. Later, I wrote that logic in JavaScript and it falls back to server side for folks with JavaScript disabled.

    Over the next 2 weeks, I’d like to try and get a map based on kartograph working on the website. I got as far as being able to display the map, however, I couldn’t get click events to fire and I’m trying to figure out what’s wrong (Side Note: If you have any advice related to events on kartograph maps, please leave a comment or catch me on twitter/IRC). If I have enough time, I’d like to convert equivalent salary to dollars based on the day’s exchange rate and add a choropleth map to show which country would give the highest equivalent salary normalized to USD based on the day’s exchange rate (The current results are in local currency units). That’s much more complicated and it’s a stretch goal.

    The data isn’t perfect though

    After all this, I’ll have to add that the data isn’t perfect. The data I currently have is country-level Purchasing Power Parity conversion factor, but having lived in two cities in India, I know that it varies between cities too.

    Overall, I’d have to say this was a fun experience and highly educational 🙂

  • Building the Mozalien

    We all love Mozilla Memes, and there’s some of us who like Reddit. Beltzner started r/MozillaMemes a while back and it was kind of painful to manually post each post onto Reddit for upvotes and discussion. It was painful enough, that we stopped doing that after some time. A few weekends ago, I had some free time and I wanted to write something interesting. That’s when I came up with mozalien.

    Mozalien is a bot that looks at RSS feed, and posts new posts to a given subreddit. Thanks to authors of python client libraries for Reddit, it even obeys the Reddit rate-limiting rules! I’ll be running it everyday locally to post updates to r/MozillaMemes. It’s still not perfect, for instance, everything posted with mozalien seems to going into the moderation queue and I’m having to clear it manually (that’s still easier that posting the URL to Reddit manually, so I’m going to bear with it for a bit). Suggestions/Patches welcome!

  • Have you fallen prey to misconfiguring Nginx?

    Ever googled for Nginx configuration? For example, how to redirect http://www.example.com to http://example.com? I have, and guess what, most of the top results are wrong or inefficient. All of these are documented in a Pitfalls page on the Nginx wiki. I’m just going to point out the parts of my config I’ve optimized recently with great help from the Nginx wiki.

    Redirect from www to non-www

    Nginx wiki recommends using return, but the version in Ubuntu 10.04 doesn’t seem to support it, so I use this:

    rewrite ^ //nigelb.me$request_uri? permanent;

    Static files

    Set the Expires and Cache-Control headers with the expires header. Another thing I do is turn off access log for static files.

    location ~ ^/(img|js|css)/ {         expires 30d;         access_log off; }

    Running PHP with Nginx

    Most PHP applications only have an index.php file that needs to be executed, everything else is usually an include.

    location ~ index.php$ {         include fastcgi_params;         fastcgi_pass    unix:/tmp/php.socket; }

    Remember to place your root directive outside any location block. Then, you can add another route for static files, just so that Nginx can serve them instead of that request going to PHP.

    location ^~ /pub/ { }

    By no means are these meant to be authoritative, and newer versions of Nginx lets you use try_files instead of some of what I’ve done.

    TL;DR: Use the Nginx Wiki. RTFM.

    Update: Lyz pointed out that the default config file that come with the CentOS packages on the Nginx website put root inside the location block instead of outside. She’s just helped me verify that and I verified the same problem with the Ubuntu packages from the Nginx website.

  • Screwups are important

    At my day job, one of the tasks that Kiran and I have to do frequently is to watch the job posting on the HasGeek Job Board and reject the ones that don’t conform to our Terms of Service. Last night, Kiran pinged me on IRC with a link to a job that he wanted me to knock off. Usually, I use phpmyadmin to do this, but I thought I’d turn it into a bash script and started writing a mysql update query.

    Head in Hands by Alex E Proimos on Flickr

    In retrospect, that’s probably my first mistake right there. Production server is really not the right place to have done this and I don’t even know why I thought it was a good idea. I wrote the query and executed it. Suddenly, I realized that I screwed up. I had that sinking feeling where you know exactly what went wrong and that it’s entirely your fault. I forgot the WHERE clause in the query and managed to reject every job in the job board. Thankfully, there was a backup handy, from about 10 minutes earlier too. Because of the power outages in North India and the fact that our servers are hosted by E2E Networks in Delhi, I had set up hourly backups earlier in the day. Quickly, I brought down apache and started restoring from the backup. In about 10 minutes from executing the wrong query, we were back and running.

    We had two things to take away from this mistake – modifying the database directly should stop, and hourly backups are a good. We don’t have a lot of data yet, and hourly backups don’t take a lot of time. I spent all day today writing code so that we don’t touch the database manually ever for this. There’s been a plan to write this code for months, but true motivation came in the form of this embarrassing mistake. Most of us hate admitting our mistakes, but when working on servers, it is essential we move to a culture of blameless post-mortems to fix broken systems and ensure the same mistakes aren’t repeated or at least occur less frequently.

  • Getting myself to use j,j,k, and l in vim

    I’ve been using vim for a while now, I guess about 2 years? Heck, I’m even writing this post in vim (yay, jekyll!). After all this, I was using arrow keys for moving around in vim. I thought, they worked for me, until I recently read a blog post about vim One of the suggestions in there was to map the arrow keys to in normal mode and insert mode. I’ve tried it out for a week now, and I wonder, how did I ever use the arrow keys; h, j, k, and l make so much more sense)

    If you’ve always wanted to get rid of your arrow key habit, add this to your .vimrc!

    nnoremap <up> <nop> nnoremap <down> <nop> nnoremap <left> <nop> nnoremap <right> <nop> inoremap <up> <nop> inoremap <down> <nop> inoremap <left> <nop> inoremap <right> <nop> 

    The one place where I used arrows was for vim completion with ^N and ^P. Quickly, I learned that ^N and ^P can replace the arrow keys there too! It’s still not yet muscle memory, but pretty close.