Blog

  • The Migration – Part I: Database

    This is a series of posts on migration from Apache and MySQL to Nginx+uwsgi and PostgreSQL. In this post, I’ll be detailing the steps we took to migrate the database from MySQL to PostgreSQL, with as little downtime as possible. Please leave a comment if you have suggestions!

    One-time Pre-migration Steps

    UTF-8

    The default encoding on PostgreSQL is SQL_ASCII and you probably want UTF-8. If you don’t know what you want, you want UTF-8 (trust me). The easiest way was to blow away the default cluster and re-create it (Thanks jacobian!)

    sudo pg_dropcluster --stop 9.1 main sudo pg_createcluster --start -e UTF-8 9.1 main

    Make PostgreSQL listen on all interfaces

    Edit /etc/postgresql/9.1/main/postgresql.conf and ensure PostgreSQL is listening on all interfaces.

    listen_addresses = '0.0.0.0'

    Allow access to PostgreSQL from the old server

    Edit /etc/postgresql/9.1/main/pg_hba.conf and add an entry for the old server (where 123.123.123.123 is the IP address of the old server).

    host    all             all             123.123.123.123/32       md5

    Install client libraries on the old server

    We use sqlalchemy for db access and I had to do apt-get install python-psycopg2.

    Creating Users and Databases

    Our process is to create a user for each app and have that app’s database be owned by this user, here’s a script that automated creating the user and database.

    #!/bin/bash sudo -u postgres createuser -d -R -S $1 sudo -u postgres createdb $1 -O $1

    The move

    Import Preparation

    Create user and database on the new server with the script above. Remember to set a password for this new user.

    Exporting

    The most worrisome bit about the whole migration was exporting the data from MySQL and importing it into PostgreSQL. We used mysql2psql and it didn’t give a lot of troubles except for the bit where floats got a little messed up. My personal recommendation is to not use real, but use numeric(7,4) with the accuracy adjusted for what you need (this particular definition is used for our lat/long definitions.

    First, run mysql2psql on your command line, this will create the config file.

    Now edit the mysql2psql.yml file and add your appropriate entries. Here’s what ours looked like

    mysql:  hostname: localhost  port: 3306  socket: /var/run/mysqld/mysqld.sock  username: mysuperuser  password: mypassword  database: mydb  destination:  # if file is given, output goes to file, else postgres  file: mydb.sql  postgres:   hostname: localhost   port: 5432   username: mysql2psql   password:   database: mysql2psql_test  # if tables is given, only the listed tables will be converted.  leave empty to convert all tables. #tables: #- table1 #- table2 # if exclude_tables is given, exclude the listed tables from the conversion. #exclude_tables: #- table3 #- table4   # if supress_data is true, only the schema definition will be exported/migrated, and not the data supress_data: false  # if supress_ddl is true, only the data will be exported/imported, and not the schema supress_ddl: false  # if force_truncate is true, forces a table truncate before table loading force_truncate: false

    When you run psql2mysql again, it will export the database mydb into mydb.sql. Before we did that, we removed this particular site from /etc/apache2/sites-enabled and restarted apache. We didn’t want the sql file to go stale as soon as it was exported. This is where the downtime starts

    Importing

    Copy the file over to the new server and import it into PostgreSQL with psql.

    sudo -u mydb psql mydb < mydb.sql

    In retrospect, I should have just imported it directly with mysql2psql. I was initially hesitant because it involved creating a user that could access that machine from outside. But I later realized I needed it anyway.

    Go live!

    Now change the settings on the old server to use the postgres database as the backend, enable the site in Apache and you’re all set to serve this site from PostgreSQL!

  • Moving On

    It’s been about 10 months since I’ve started working at HasGeek and it’s been an amazing few months. I’ve been part of 4 amazing conferences, a workshop, and a bunch of Geekups. Among other things, I’ve written code, organized content, and edited videos. It’s probably the most intense job I’ve ever had.

    When I joined HasGeek last year, I’d committed for a minimum of 6 months. After 10 months at HasGeek, I’m moving on. I’m very exicted to announce that starting Oct 2, I’ll be working for the Open Knowledge Foundation a Data Wrangler and Web Developer! I’m very excited and looking forward to working with the amazing folks at OKFN. As Sunil pointed out, I’m now in the non-profit sector 🙂

  • The Making of the Equivalent Salary Calculator

    I spoke to Rufus Pollock from OKFN a week ago and he encouraged me to try one of the Get the Data challenges. I decided to build the equivalent salary converter since I’d always been curious to have some way to equivalent salary based on cost of living.

    What’s the data I need?

    The first challenge was to understand what data I needed to solve this problem. I spent a few hours reading the Wikipedia page for Purchasing Power Parity . As someone who hated originally Economics, it took me a while to make some sense of all this. I further branched to reading about the Big Mac Index and Geary-Khamis dollar, among others.

    Finding the Data!

    Since the original challenge itself is called “Get the Data Challenge”, I’ll honestly admit that this was perhaps the most challenging of all the tasks in building this application (XML parsing finished a close second :P). The Wikipedia article on Purchasing Power Parity has links to several data sources, which was a cause for great joy until I discovered all of them lead to 404s. I went through some parts of the World Bank data and looked at the UN Data website. I was stuck at not knowing what exactly I was looking for.

    At some point, as I was going through another part of the World Bank data site, I saw something about indicators and decided to poke at it. At one point, I even wondered if I should give up and pick some of the other interesting data available like Physicians per 1000. Finally, I stumbled upon the PPP conversion factor data. I didn’t realize this was the data I needed until a little while later. For someone like me, who’s unfamiliar with the words involved, it’s not easy even recognizing that I’ve found what I was looking for. I exported the data from the World Bank website and decided to have a go at parsing it.

    Parsing the data a.k.a. XML Hell

    76756 lines of XML?! It send shivers down my spine when I first opened the file. I started off with the lxml module to parse the data. It took me several hours of reading the documentation, and trial and error to get a hang of the API. I raced to write down a quick python script to take all the data from the XML and give me a CSV with data that I wanted. The original XML had much more data than I wanted. The script and CSV output of the script are both on GitHub if you’d like to look. I suspect if you’d like to play with another World Bank dataset, this script might give you a starting point. In retrospect, importing the data directly into Recline DataHub might have been a good idea.

    Writing the App

    I strongly believe in MVP when I build an application. My MVP when starting was getting a form working which would let me select a country of origin and a target country and it would calculate the salary for that country. This is what the app can currently do. The original challenge involved showing a map and clicking on a point in the map would show the equivalent salary for that country. I’m very glad that I decided on an easier to achieve MVP or else I’d have nothing to show right now. The UI is built on Zurb Foundation and the minimal backend is written in Flask. The first iteration of the app didn’t use client-side Javascript to do the calculation, the form was POSTed to the server for every calculation. Later, I wrote that logic in JavaScript and it falls back to server side for folks with JavaScript disabled.

    Over the next 2 weeks, I’d like to try and get a map based on kartograph working on the website. I got as far as being able to display the map, however, I couldn’t get click events to fire and I’m trying to figure out what’s wrong (Side Note: If you have any advice related to events on kartograph maps, please leave a comment or catch me on twitter/IRC). If I have enough time, I’d like to convert equivalent salary to dollars based on the day’s exchange rate and add a choropleth map to show which country would give the highest equivalent salary normalized to USD based on the day’s exchange rate (The current results are in local currency units). That’s much more complicated and it’s a stretch goal.

    The data isn’t perfect though

    After all this, I’ll have to add that the data isn’t perfect. The data I currently have is country-level Purchasing Power Parity conversion factor, but having lived in two cities in India, I know that it varies between cities too.

    Overall, I’d have to say this was a fun experience and highly educational 🙂

  • Building the Mozalien

    We all love Mozilla Memes, and there’s some of us who like Reddit. Beltzner started r/MozillaMemes a while back and it was kind of painful to manually post each post onto Reddit for upvotes and discussion. It was painful enough, that we stopped doing that after some time. A few weekends ago, I had some free time and I wanted to write something interesting. That’s when I came up with mozalien.

    Mozalien is a bot that looks at RSS feed, and posts new posts to a given subreddit. Thanks to authors of python client libraries for Reddit, it even obeys the Reddit rate-limiting rules! I’ll be running it everyday locally to post updates to r/MozillaMemes. It’s still not perfect, for instance, everything posted with mozalien seems to going into the moderation queue and I’m having to clear it manually (that’s still easier that posting the URL to Reddit manually, so I’m going to bear with it for a bit). Suggestions/Patches welcome!

  • Have you fallen prey to misconfiguring Nginx?

    Ever googled for Nginx configuration? For example, how to redirect http://www.example.com to http://example.com? I have, and guess what, most of the top results are wrong or inefficient. All of these are documented in a Pitfalls page on the Nginx wiki. I’m just going to point out the parts of my config I’ve optimized recently with great help from the Nginx wiki.

    Redirect from www to non-www

    Nginx wiki recommends using return, but the version in Ubuntu 10.04 doesn’t seem to support it, so I use this:

    rewrite ^ //nigelb.me$request_uri? permanent;

    Static files

    Set the Expires and Cache-Control headers with the expires header. Another thing I do is turn off access log for static files.

    location ~ ^/(img|js|css)/ {         expires 30d;         access_log off; }

    Running PHP with Nginx

    Most PHP applications only have an index.php file that needs to be executed, everything else is usually an include.

    location ~ index.php$ {         include fastcgi_params;         fastcgi_pass    unix:/tmp/php.socket; }

    Remember to place your root directive outside any location block. Then, you can add another route for static files, just so that Nginx can serve them instead of that request going to PHP.

    location ^~ /pub/ { }

    By no means are these meant to be authoritative, and newer versions of Nginx lets you use try_files instead of some of what I’ve done.

    TL;DR: Use the Nginx Wiki. RTFM.

    Update: Lyz pointed out that the default config file that come with the CentOS packages on the Nginx website put root inside the location block instead of outside. She’s just helped me verify that and I verified the same problem with the Ubuntu packages from the Nginx website.