Have you fallen prey to misconfiguring Nginx?

Ever googled for Nginx configuration? For example, how to redirect

Ever googled for Nginx configuration? For example, how to redirect http://www.example.com to http://example.com? I have, and guess what, most of the top results are wrong or inefficient. All of these are documented in a Pitfalls page on the Nginx wiki. I’m just going to point out the parts of my config I’ve optimized recently with great help from the Nginx wiki.

Redirect from www to non-www

Nginx wiki recommends using return, but the version in Ubuntu 10.04 doesn’t seem to support it, so I use this:

rewrite ^ //nigelb.me$request_uri? permanent;

Static files

Set the Expires and Cache-Control headers with the expires header. Another thing I do is turn off access log for static files.

location ~ ^/(img|js|css)/ {         expires 30d;         access_log off; }

Running PHP with Nginx

Most PHP applications only have an index.php file that needs to be executed, everything else is usually an include.

location ~ index.php$ {         include fastcgi_params;         fastcgi_pass    unix:/tmp/php.socket; }

Remember to place your root directive outside any location block. Then, you can add another route for static files, just so that Nginx can serve them instead of that request going to PHP.

location ^~ /pub/ { }

By no means are these meant to be authoritative, and newer versions of Nginx lets you use try_files instead of some of what I’ve done.

TL;DR: Use the Nginx Wiki. RTFM.

Update: Lyz pointed out that the default config file that come with the CentOS packages on the Nginx website put root inside the location block instead of outside. She’s just helped me verify that and I verified the same problem with the Ubuntu packages from the Nginx website.

Screwups are important

it’s been quite a while since i blogged, more than 6 months or so. between a family emergency in december,

At my day job, one of the tasks that Kiran and I have to do frequently is to watch the job posting on the HasGeek Job Board and reject the ones that don’t conform to our Terms of Service. Last night, Kiran pinged me on IRC with a link to a job that he wanted me to knock off. Usually, I use phpmyadmin to do this, but I thought I’d turn it into a bash script and started writing a mysql update query.

Head in Hands by Alex E Proimos on Flickr

In retrospect, that’s probably my first mistake right there. Production server is really not the right place to have done this and I don’t even know why I thought it was a good idea. I wrote the query and executed it. Suddenly, I realized that I screwed up. I had that sinking feeling where you know exactly what went wrong and that it’s entirely your fault. I forgot the WHERE clause in the query and managed to reject every job in the job board. Thankfully, there was a backup handy, from about 10 minutes earlier too. Because of the power outages in North India and the fact that our servers are hosted by E2E Networks in Delhi, I had set up hourly backups earlier in the day. Quickly, I brought down apache and started restoring from the backup. In about 10 minutes from executing the wrong query, we were back and running.

We had two things to take away from this mistake – modifying the database directly should stop, and hourly backups are a good. We don’t have a lot of data yet, and hourly backups don’t take a lot of time. I spent all day today writing code so that we don’t touch the database manually ever for this. There’s been a plan to write this code for months, but true motivation came in the form of this embarrassing mistake. Most of us hate admitting our mistakes, but when working on servers, it is essential we move to a culture of blameless post-mortems to fix broken systems and ensure the same mistakes aren’t repeated or at least occur less frequently.

32-bit binary arrgh

Running 32-bit binaries on 64-bit machines and it refuses to work? Over the last one month this happened to me several times. Sometimes even disguised as something else. Finally, figured out the solution and noted it down all over the place so I don’t forget.

Building OpenCity

The magic words are libc6-1386 and ia32-libs. Oh, and to check if a binary is 32-bit or 64-bit, use the file command.

Amazon RDS Timezone Hack

‘Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.’ – Amazon RDS Website

Server room by Torkild Retvedt

Amazon RDS uses UTC by default. Personally, I think systems should use UTC by default, but at my work place, all systems are in IST, which means we want the RDS to work in IST too. So, we have this hack of setting the init_connect parameter to set time_zone = 'Asia/Kolkata';. This works great, except when you have to restart RDS. We’ve had to restart our instance twice so far, both times leading to considerable downtime, pain, and grief because of this hack (thankfully, the RDS machine is a backup machine and not used in production).

When it happened the first time, my ex-colleague did the modification and all I did note down was to remove the init_connect parameter. When it happened the second time, I remembered and set the init_connect parameter to an empty string, which I thought would work. Unfortunately, it didn’t. We talked to Amazon Support and now I know that when it happens, I should not be setting it to an empty value, but resetting instead.

rds-reset-db-parameter-group my-params --parameters 'name=init_connect,method=immediate' 

This post is for all those poor souls who might make the same mistake as me as well as a note to myself (though I suspect I’ll never forget this lesson :P) 🙂

What I learned about the cloud

My day job primarily involves maintaining the bunch of Ubuntu servers. What the last few months has taught me is to plan for failure. With the ‘cloud’ being everywhere, we’re probably in a false sense of security. I’ve 3 anecdotes to share from my brief experience.

One – One of our database instances needed to be restarted. After the restart, we couldn’t connect to it. It took a few hours for Support to get back to us (it was an Amazon RDS instance) and figure out what was the problem. Our init_connect parameter, in which we put a hack for timezone, was causing the failure to connect after a restart.

Two – One machine randomly died due to hardware failure on the host. Luckily, I had just launched a new instance which was meant to replace it eventually. Within a few minutes, I switched the IP address with the new instance. Thankfully, there was no service disruption.

Three – I have a VPS with a small provider. This is the VPS that powers this blog and my IRC session. In the first week of July, the provider notified me that there was some targeted network attack happening on two of their hosts (one of which hosts my VPS) and they’ll be power cycling the hosts several times a day. This, of course, brought down my website (for a short while) and my IRC session (until I manually started it). Note that my website is not high-availability or or hosted with one of the major providers.

I don’t blame the providers for any of these failures/issues. It is and always will be the responsibility of the customer to make sure there are backups and disaster recovery plans in place because the only thing servers consistently do is fail. It maybe after 1 hour, 1 week, 1 month, or a decade. But they fail. Eventually.

At a recent conference I attended, there was a whole session about planning for failure. This may include making sure that you have backup servers, new servers can be brought up quickly in an automated manner, making sure there is no dependency on a single provider or service, making sure the application handles not being able to access another machine gracefully, and much more. I’ve agonizingly gone over disaster scenarios over the past few days, situations in which any of the servers go down, whether it be App server, DB server, or Monitoring server, or Load Balancer, or even the entire data center, and in conclusion, all I have to say is ‘Prepare for failure.’