Since October, I’ve been a remote employee, working for the The Open Knowledge Foundtion…
Since October, I’ve been a remote employee, working for the The Open Knowledge Foundtion. I was nervous about being a remotee and I talked to a lot of my friends who’re remotees at Mozilla. Shout out to ashish, fox2mike, glob, and Unfocused for helping me out. I also enjoyed reading about people who wrote what their team did, particularly, RelEng at Mozilla, shout out to you guys as well! Also, The Oatmeal was right! Although, ironically, I’ve started to wake up unnaturally early after being a remote employee 😛
The biggest fear about working from home were the distractions The most important distraction-killer is a time tracker. We use toggl for timesheeting anyway, and turning off the time tracker when I’m distracted helps. After a few times of doing that, I automatically stop myself when I’m getting distracted. I keep two Firefox profiles, one for work and one for everything else. While I’m working, the non-work profile is closed, so I can’t get distracted. I reward myself with time to look at it when I finish 2 hours of work and take a short break.
Having good communication channels is great since we’re distributed. Every day, our team gets on a stand up call. It’s great to actually hear everyone talk about their and ask for help from the team if they’re stuck. We also have a Campfire chat room and an IRC channel (#okfn on irc.freenode.net); they keep me sane. Seriously. Speaking of sanity, on some days, the Campfire room is just a world of gifs, we’re awesome like that. There’s also the weekly notebook posts to keep track of what folks in other teams do.
Time’s flown by so fast; 10 days ago, I finished a month here! It’s been a fun and busy time!
PS: If you want to work with me at OKFN, we’re hiring for a bunch of positions!
I recently had to migrate a bunch of databases from MySQL to PostgreSQL. This is the process I…
This is a series of posts on migration from Apache and MySQL to Nginx+uwsgi and PostgreSQL. In this post, I’ll be detailing the steps we took to migrate the database from MySQL to PostgreSQL, with as little downtime as possible. Please leave a comment if you have suggestions!
One-time Pre-migration Steps
UTF-8
The default encoding on PostgreSQL is SQL_ASCII and you probably want UTF-8. If you don’t know what you want, you want UTF-8 (trust me). The easiest way was to blow away the default cluster and re-create it (Thanks jacobian!)
sudo pg_dropcluster --stop 9.1 main sudo pg_createcluster --start -e UTF-8 9.1 main
Make PostgreSQL listen on all interfaces
Edit /etc/postgresql/9.1/main/postgresql.conf and ensure PostgreSQL is listening on all interfaces.
listen_addresses = '0.0.0.0'
Allow access to PostgreSQL from the old server
Edit /etc/postgresql/9.1/main/pg_hba.conf and add an entry for the old server (where 123.123.123.123 is the IP address of the old server).
host all all 123.123.123.123/32 md5
Install client libraries on the old server
We use sqlalchemy for db access and I had to do apt-get install python-psycopg2.
Creating Users and Databases
Our process is to create a user for each app and have that app’s database be owned by this user, here’s a script that automated creating the user and database.
Create user and database on the new server with the script above. Remember to set a password for this new user.
Exporting
The most worrisome bit about the whole migration was exporting the data from MySQL and importing it into PostgreSQL. We used mysql2psql and it didn’t give a lot of troubles except for the bit where floats got a little messed up. My personal recommendation is to not use real, but use numeric(7,4) with the accuracy adjusted for what you need (this particular definition is used for our lat/long definitions.
First, run mysql2psql on your command line, this will create the config file.
Now edit the mysql2psql.yml file and add your appropriate entries. Here’s what ours looked like
mysql:hostname:localhostport:3306socket:/var/run/mysqld/mysqld.sockusername:mysuperuserpassword:mypassworddatabase:mydbdestination:# if file is given, output goes to file, else postgresfile:mydb.sqlpostgres:hostname:localhostport:5432username:mysql2psqlpassword:database:mysql2psql_test# if tables is given, only the listed tables will be converted. leave empty to convert all tables.#tables:#- table1#- table2# if exclude_tables is given, exclude the listed tables from the conversion.#exclude_tables:#- table3#- table4# if supress_data is true, only the schema definition will be exported/migrated, and not the datasupress_data:false# if supress_ddl is true, only the data will be exported/imported, and not the schemasupress_ddl:false# if force_truncate is true, forces a table truncate before table loadingforce_truncate:false
When you run psql2mysql again, it will export the database mydb into mydb.sql. Before we did that, we removed this particular site from /etc/apache2/sites-enabled and restarted apache. We didn’t want the sql file to go stale as soon as it was exported. This is where the downtime starts
Importing
Copy the file over to the new server and import it into PostgreSQL with psql.
sudo -u mydb psql mydb < mydb.sql
In retrospect, I should have just imported it directly with mysql2psql. I was initially hesitant because it involved creating a user that could access that machine from outside. But I later realized I needed it anyway.
Go live!
Now change the settings on the old server to use the postgres database as the backend, enable the site in Apache and you’re all set to serve this site from PostgreSQL!
It’s been about 10 months since I’ve started working at HasGeek…
It’s been about 10 months since I’ve started working at HasGeek and it’s been an amazing few months. I’ve been part of 4 amazing conferences, a workshop, and a bunch of Geekups. Among other things, I’ve written code, organized content, and edited videos. It’s probably the most intense job I’ve ever had.
When I joined HasGeek last year, I’d committed for a minimum of 6 months. After 10 months at HasGeek, I’m moving on. I’m very exicted to announce that starting Oct 2, I’ll be working for the Open Knowledge Foundation a Data Wrangler and Web Developer! I’m very excited and looking forward to working with the amazing folks at OKFN. As Sunil pointed out, I’m now in the non-profit sector 🙂
I spoke to Rufus Pollock from OKFN a week ago and he encouraged me to try one of the Get the Data challenges. I decided to build the equivalent salary converter since I’d always been curious to have some way to equivalent salary based on cost of living.
What’s the data I need?
The first challenge was to understand what data I needed to solve this problem. I spent a few hours reading the Wikipedia page for Purchasing Power Parity . As someone who hated originally Economics, it took me a while to make some sense of all this. I further branched to reading about the Big Mac Index and Geary-Khamis dollar, among others.
Finding the Data!
Since the original challenge itself is called “Get the Data Challenge”, I’ll honestly admit that this was perhaps the most challenging of all the tasks in building this application (XML parsing finished a close second :P). The Wikipedia article on Purchasing Power Parity has links to several data sources, which was a cause for great joy until I discovered all of them lead to 404s. I went through some parts of the World Bank data and looked at the UN Data website. I was stuck at not knowing what exactly I was looking for.
At some point, as I was going through another part of the World Bank data site, I saw something about indicators and decided to poke at it. At one point, I even wondered if I should give up and pick some of the other interesting data available like Physicians per 1000. Finally, I stumbled upon the PPP conversion factor data. I didn’t realize this was the data I needed until a little while later. For someone like me, who’s unfamiliar with the words involved, it’s not easy even recognizing that I’ve found what I was looking for. I exported the data from the World Bank website and decided to have a go at parsing it.
Parsing the data a.k.a. XML Hell
76756 lines of XML?! It send shivers down my spine when I first opened the file. I started off with the lxml module to parse the data. It took me several hours of reading the documentation, and trial and error to get a hang of the API. I raced to write down a quick python script to take all the data from the XML and give me a CSV with data that I wanted. The original XML had much more data than I wanted. The script and CSV output of the script are both on GitHub if you’d like to look. I suspect if you’d like to play with another World Bank dataset, this script might give you a starting point. In retrospect, importing the data directly into Recline DataHub might have been a good idea.
Writing the App
I strongly believe in MVP when I build an application. My MVP when starting was getting a form working which would let me select a country of origin and a target country and it would calculate the salary for that country. This is what the app can currently do. The original challenge involved showing a map and clicking on a point in the map would show the equivalent salary for that country. I’m very glad that I decided on an easier to achieve MVP or else I’d have nothing to show right now. The UI is built on Zurb Foundation and the minimal backend is written in Flask. The first iteration of the app didn’t use client-side Javascript to do the calculation, the form was POSTed to the server for every calculation. Later, I wrote that logic in JavaScript and it falls back to server side for folks with JavaScript disabled.
Over the next 2 weeks, I’d like to try and get a map based on kartograph working on the website. I got as far as being able to display the map, however, I couldn’t get click events to fire and I’m trying to figure out what’s wrong (Side Note: If you have any advice related to events on kartograph maps, please leave a comment or catch me on twitter/IRC). If I have enough time, I’d like to convert equivalent salary to dollars based on the day’s exchange rate and add a choropleth map to show which country would give the highest equivalent salary normalized to USD based on the day’s exchange rate (The current results are in local currency units). That’s much more complicated and it’s a stretch goal.
The data isn’t perfect though
After all this, I’ll have to add that the data isn’t perfect. The data I currently have is country-level Purchasing Power Parity conversion factor, but having lived in two cities in India, I know that it varies between cities too.
Overall, I’d have to say this was a fun experience and highly educational 🙂
We all love Mozilla Memes, and there’s some of us who like Reddit. Beltzner started r/MozillaMemes a while back and it
We all love Mozilla Memes, and there’s some of us who like Reddit. Beltzner started r/MozillaMemes a while back and it was kind of painful to manually post each post onto Reddit for upvotes and discussion. It was painful enough, that we stopped doing that after some time. A few weekends ago, I had some free time and I wanted to write something interesting. That’s when I came up with mozalien.
Mozalien is a bot that looks at RSS feed, and posts new posts to a given subreddit. Thanks to authors of python client libraries for Reddit, it even obeys the Reddit rate-limiting rules! I’ll be running it everyday locally to post updates to r/MozillaMemes. It’s still not perfect, for instance, everything posted with mozalien seems to going into the moderation queue and I’m having to clear it manually (that’s still easier that posting the URL to Reddit manually, so I’m going to bear with it for a bit). Suggestions/Patches welcome!