Blog

  • Amazon RDS Timezone Hack

    ‘Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.’ – Amazon RDS Website

    Server room by Torkild Retvedt

    Amazon RDS uses UTC by default. Personally, I think systems should use UTC by default, but at my work place, all systems are in IST, which means we want the RDS to work in IST too. So, we have this hack of setting the init_connect parameter to set time_zone = 'Asia/Kolkata';. This works great, except when you have to restart RDS. We’ve had to restart our instance twice so far, both times leading to considerable downtime, pain, and grief because of this hack (thankfully, the RDS machine is a backup machine and not used in production).

    When it happened the first time, my ex-colleague did the modification and all I did note down was to remove the init_connect parameter. When it happened the second time, I remembered and set the init_connect parameter to an empty string, which I thought would work. Unfortunately, it didn’t. We talked to Amazon Support and now I know that when it happens, I should not be setting it to an empty value, but resetting instead.

    rds-reset-db-parameter-group my-params --parameters 'name=init_connect,method=immediate' 

    This post is for all those poor souls who might make the same mistake as me as well as a note to myself (though I suspect I’ll never forget this lesson :P) 🙂

  • Why work doesn’t happen at work

    Often I’ve noticed that my most productive time is actually at home. So, when I have to spend a few hours of solid work that I don’t need to depend on a teammate or a teammate doesn’t have to depend on me, I work from home. I get a lot more done thanks to less interruptions.

  • What I learned about the cloud

    My day job primarily involves maintaining the bunch of Ubuntu servers. What the last few months has taught me is to plan for failure. With the ‘cloud’ being everywhere, we’re probably in a false sense of security. I’ve 3 anecdotes to share from my brief experience.

    One – One of our database instances needed to be restarted. After the restart, we couldn’t connect to it. It took a few hours for Support to get back to us (it was an Amazon RDS instance) and figure out what was the problem. Our init_connect parameter, in which we put a hack for timezone, was causing the failure to connect after a restart.

    Two – One machine randomly died due to hardware failure on the host. Luckily, I had just launched a new instance which was meant to replace it eventually. Within a few minutes, I switched the IP address with the new instance. Thankfully, there was no service disruption.

    Three – I have a VPS with a small provider. This is the VPS that powers this blog and my IRC session. In the first week of July, the provider notified me that there was some targeted network attack happening on two of their hosts (one of which hosts my VPS) and they’ll be power cycling the hosts several times a day. This, of course, brought down my website (for a short while) and my IRC session (until I manually started it). Note that my website is not high-availability or or hosted with one of the major providers.

    I don’t blame the providers for any of these failures/issues. It is and always will be the responsibility of the customer to make sure there are backups and disaster recovery plans in place because the only thing servers consistently do is fail. It maybe after 1 hour, 1 week, 1 month, or a decade. But they fail. Eventually.

    At a recent conference I attended, there was a whole session about planning for failure. This may include making sure that you have backup servers, new servers can be brought up quickly in an automated manner, making sure there is no dependency on a single provider or service, making sure the application handles not being able to access another machine gracefully, and much more. I’ve agonizingly gone over disaster scenarios over the past few days, situations in which any of the servers go down, whether it be App server, DB server, or Monitoring server, or Load Balancer, or even the entire data center, and in conclusion, all I have to say is ‘Prepare for failure.’

  • Shooting myself in the foot with Apparmor

    The other day at work, I was working on setting up a new database server. This is the first time in a while we’re doing this. Almost no-one remembers who or how it was done the last time. Our data is kinda big, so we tend to put the mysql data files into an EBS volume by itself so that we always have the data separate from the machine and because we get as much space as want. We created the new machine, new disk, changed the path of the data folders, and started mysql. BAM! It threw a whole bunch of errors about permissions.

    I went in and checked the ownership, which seemed to be correct, but re-ownershipped everything anyway. Tried again. Nope, didn’t work. Out of frustration, tried again after doing a chmod -R 777. Still failed. For a while, we googled extensively for the error, leading us to nothing much to go on. Before this, we had some backup stuff to do, so I think it was close to 1 am when we actually got down to troubleshooting this. After sometime, we had the sense to google for what we wanted to accomplish, leading me to apparmor.

    Then, my memory kicked in about Apparmor and what it did. I figured out that mysql probably didn’t have permission to use other directories. We gave it permissions and it worked! But, we ended up not having enough time to restore data on this new server in and rotate out the old server. Overall, we were working on this from 12 am to 4 am. The next day, my QOTD was from my friend, who shall not be named – ‘Oops. That said, it’s happened to me, too. The irony bit is that I’m one of the primary upstream apparmor devs.’

  • RTFD and Summit

    Writing documentation isn’t easy. And maintaining up-to-date documentation isn’t easy either. readthedocs.org is a Django project which was written as part of Django Dash It is backed by RevSys, Python Software Foundation, and Mozilla Webdev. We can write our docs in Sphinx and import it into Read the Docs.

    I’ve just got it setup for summit. New contributors to Summit can see its developer documentation at summit.rtfd.org