The Migration – Part I: Database

I recently had to migrate a bunch of databases from MySQL to PostgreSQL. This is the process I…

This is a series of posts on migration from Apache and MySQL to Nginx+uwsgi and PostgreSQL. In this post, I’ll be detailing the steps we took to migrate the database from MySQL to PostgreSQL, with as little downtime as possible. Please leave a comment if you have suggestions!

One-time Pre-migration Steps

UTF-8

The default encoding on PostgreSQL is SQL_ASCII and you probably want UTF-8. If you don’t know what you want, you want UTF-8 (trust me). The easiest way was to blow away the default cluster and re-create it (Thanks jacobian!)

sudo pg_dropcluster --stop 9.1 main sudo pg_createcluster --start -e UTF-8 9.1 main

Make PostgreSQL listen on all interfaces

Edit /etc/postgresql/9.1/main/postgresql.conf and ensure PostgreSQL is listening on all interfaces.

listen_addresses = '0.0.0.0'

Allow access to PostgreSQL from the old server

Edit /etc/postgresql/9.1/main/pg_hba.conf and add an entry for the old server (where 123.123.123.123 is the IP address of the old server).

host    all             all             123.123.123.123/32       md5

Install client libraries on the old server

We use sqlalchemy for db access and I had to do apt-get install python-psycopg2.

Creating Users and Databases

Our process is to create a user for each app and have that app’s database be owned by this user, here’s a script that automated creating the user and database.

#!/bin/bash sudo -u postgres createuser -d -R -S $1 sudo -u postgres createdb $1 -O $1

The move

Import Preparation

Create user and database on the new server with the script above. Remember to set a password for this new user.

Exporting

The most worrisome bit about the whole migration was exporting the data from MySQL and importing it into PostgreSQL. We used mysql2psql and it didn’t give a lot of troubles except for the bit where floats got a little messed up. My personal recommendation is to not use real, but use numeric(7,4) with the accuracy adjusted for what you need (this particular definition is used for our lat/long definitions.

First, run mysql2psql on your command line, this will create the config file.

Now edit the mysql2psql.yml file and add your appropriate entries. Here’s what ours looked like

mysql:  hostname: localhost  port: 3306  socket: /var/run/mysqld/mysqld.sock  username: mysuperuser  password: mypassword  database: mydb  destination:  # if file is given, output goes to file, else postgres  file: mydb.sql  postgres:   hostname: localhost   port: 5432   username: mysql2psql   password:   database: mysql2psql_test  # if tables is given, only the listed tables will be converted.  leave empty to convert all tables. #tables: #- table1 #- table2 # if exclude_tables is given, exclude the listed tables from the conversion. #exclude_tables: #- table3 #- table4   # if supress_data is true, only the schema definition will be exported/migrated, and not the data supress_data: false  # if supress_ddl is true, only the data will be exported/imported, and not the schema supress_ddl: false  # if force_truncate is true, forces a table truncate before table loading force_truncate: false

When you run psql2mysql again, it will export the database mydb into mydb.sql. Before we did that, we removed this particular site from /etc/apache2/sites-enabled and restarted apache. We didn’t want the sql file to go stale as soon as it was exported. This is where the downtime starts

Importing

Copy the file over to the new server and import it into PostgreSQL with psql.

sudo -u mydb psql mydb < mydb.sql

In retrospect, I should have just imported it directly with mysql2psql. I was initially hesitant because it involved creating a user that could access that machine from outside. But I later realized I needed it anyway.

Go live!

Now change the settings on the old server to use the postgres database as the backend, enable the site in Apache and you’re all set to serve this site from PostgreSQL!

The Making of the Equivalent Salary Calculator

I spoke to Rufus Pollock from OKFN a week ago and he encouraged me to try one of the Get the Data challenges. I decided to build the equivalent salary converter since I’d always been curious to have some way to equivalent salary based on cost of living.

What’s the data I need?

The first challenge was to understand what data I needed to solve this problem. I spent a few hours reading the Wikipedia page for Purchasing Power Parity . As someone who hated originally Economics, it took me a while to make some sense of all this. I further branched to reading about the Big Mac Index and Geary-Khamis dollar, among others.

Finding the Data!

Since the original challenge itself is called “Get the Data Challenge”, I’ll honestly admit that this was perhaps the most challenging of all the tasks in building this application (XML parsing finished a close second :P). The Wikipedia article on Purchasing Power Parity has links to several data sources, which was a cause for great joy until I discovered all of them lead to 404s. I went through some parts of the World Bank data and looked at the UN Data website. I was stuck at not knowing what exactly I was looking for.

At some point, as I was going through another part of the World Bank data site, I saw something about indicators and decided to poke at it. At one point, I even wondered if I should give up and pick some of the other interesting data available like Physicians per 1000. Finally, I stumbled upon the PPP conversion factor data. I didn’t realize this was the data I needed until a little while later. For someone like me, who’s unfamiliar with the words involved, it’s not easy even recognizing that I’ve found what I was looking for. I exported the data from the World Bank website and decided to have a go at parsing it.

Parsing the data a.k.a. XML Hell

76756 lines of XML?! It send shivers down my spine when I first opened the file. I started off with the lxml module to parse the data. It took me several hours of reading the documentation, and trial and error to get a hang of the API. I raced to write down a quick python script to take all the data from the XML and give me a CSV with data that I wanted. The original XML had much more data than I wanted. The script and CSV output of the script are both on GitHub if you’d like to look. I suspect if you’d like to play with another World Bank dataset, this script might give you a starting point. In retrospect, importing the data directly into Recline DataHub might have been a good idea.

Writing the App

I strongly believe in MVP when I build an application. My MVP when starting was getting a form working which would let me select a country of origin and a target country and it would calculate the salary for that country. This is what the app can currently do. The original challenge involved showing a map and clicking on a point in the map would show the equivalent salary for that country. I’m very glad that I decided on an easier to achieve MVP or else I’d have nothing to show right now. The UI is built on Zurb Foundation and the minimal backend is written in Flask. The first iteration of the app didn’t use client-side Javascript to do the calculation, the form was POSTed to the server for every calculation. Later, I wrote that logic in JavaScript and it falls back to server side for folks with JavaScript disabled.

Over the next 2 weeks, I’d like to try and get a map based on kartograph working on the website. I got as far as being able to display the map, however, I couldn’t get click events to fire and I’m trying to figure out what’s wrong (Side Note: If you have any advice related to events on kartograph maps, please leave a comment or catch me on twitter/IRC). If I have enough time, I’d like to convert equivalent salary to dollars based on the day’s exchange rate and add a choropleth map to show which country would give the highest equivalent salary normalized to USD based on the day’s exchange rate (The current results are in local currency units). That’s much more complicated and it’s a stretch goal.

The data isn’t perfect though

After all this, I’ll have to add that the data isn’t perfect. The data I currently have is country-level Purchasing Power Parity conversion factor, but having lived in two cities in India, I know that it varies between cities too.

Overall, I’d have to say this was a fun experience and highly educational 🙂

Building the Mozalien

We all love Mozilla Memes, and there’s some of us who like Reddit. Beltzner started r/MozillaMemes a while back and it

We all love Mozilla Memes, and there’s some of us who like Reddit. Beltzner started r/MozillaMemes a while back and it was kind of painful to manually post each post onto Reddit for upvotes and discussion. It was painful enough, that we stopped doing that after some time. A few weekends ago, I had some free time and I wanted to write something interesting. That’s when I came up with mozalien.

Mozalien is a bot that looks at RSS feed, and posts new posts to a given subreddit. Thanks to authors of python client libraries for Reddit, it even obeys the Reddit rate-limiting rules! I’ll be running it everyday locally to post updates to r/MozillaMemes. It’s still not perfect, for instance, everything posted with mozalien seems to going into the moderation queue and I’m having to clear it manually (that’s still easier that posting the URL to Reddit manually, so I’m going to bear with it for a bit). Suggestions/Patches welcome!

Geektionary, Julython, and BrowserID

It’s been quite a while since I blogged, more than 6 months or so. Between a family emergency in December,

It’s been quite a while since I blogged, more than 6 months or so. Between a family emergency in December, then switching to a Macbook since Jan, and then jekyll not working properly on it (entirely due to my lack of ruby knowledge/experience), I never found the time. Hopefuly, I’ll have more time in the coming weeks (yeah, right :P). On Sunday, I noticed Julython pop up in my twitter feed. It’s an event about encouraging developers of all skill levels to try and work on their pet project (s) just a little each day, in this instance specific to python-related projects. This reminded of a project I started a while back to get a hang of MongoDB called Geektionary. I spend some time plotting and it seems like a perfect fit to do a Julython project. I get to learn lots of new things and start a project from scratch to finish on my own.

At work, we use Flask all the time, but we have our own boilerplate for our Flask apps. For Geektionary, I’m using the large app how-to on the Flask wiki. I made some modification to that example and already got something basic up and running. The change I’ve made is to change how configuration is handled, I’ve always liked to have a base settings file, and then a local file overriding it. Thanks to Julython, I think I’ll be working on Geektionary a little bit each day to make a final product that allows login with BrowserID and all that jazz 😉

Speaking of BrowserID, another thing I want to do over this July, if I have time, is to port mkelly’s django-browserid to Flask. It is non-trivial because I need to support multiple DB backends somehow. I have a basic idea of what I want to do in my head, I’m just hoping to find some time to punch it out into code.

Write more beautiful python code

Contributing to more Python projects like Launchpad has also taught me about writing PEP8 compliant code. One is expected to run make lint and fix lint errors.

Running pep8. Not real code though. I'm not /that/ bad :P

A major part of the code I write at my day job consist of scripts, which I most often prefer to write in Python (second only to bash). I hate to admit it, but my code is rarely pretty. I just write hacky scripts. I now use pep8 and pyflakes on all the python code that I write. It does look nicer, although occasionally painful to fix 😉 Almost all the time pyflakes help me catch an error in the script even before I run the script.