Signalling Problems

The Problem

We recently had need to start and stop Redis programatically, on an arbitrary port. Unfortunately, the version of the Redis server that we’re using doesn’t support specifying this information as command line arguments. The two options we had were to either write out a temporary config file and point redis-server to that, or to pipe the config in to the redis-server command. The latter was chosen, as it was decided that writing the temporary config files all over the place was messy.

Our first try was like so:

pid = spawn("echo #{server_config} | redis-server -") #start
Process.kill("TERM", pid) #stop

Unfortunately, the Process.kill call only terminates the shell process, and not the actual redis-server process.

The Solution

A solution was found in the form of process groups. We added an option to the spawn call, making it put the spawned process in a process group:

pid = spawn("echo '#{server_config}' | redis-server -", :pgroup => true)

The created process group has an ID that’s the same as the process ID returned by the spawn call. Now we just needed to find a way to send a signal to all the processes in a certain group.

This turned out to be easy, as Process.kill treats negative arguments as process group IDs instead of process IDs:

Process.kill("TERM", -pid)

So now we can spawn a process without having to worry about tracking the sub-processes it spawns, and still be able to clean them all up at a later time.

CSS Selectors

How much thought do we, as developers, give to front-end design?

And when we do give it thought do we ensure that we keep our CSS selectors tidy and refactored the same way we do our code?

The following article gives some useful pointers on selector intent which may just help the next time you go and add to your CSS.

Locale Hell

Having just installed MongoDB within an Ubuntu instance on an Amazon EC2 instance I was expected everything to just work out of the box.  Unfortunately this just wasn’t the case and I received a very helpful message:

terminate called after throwing an instance of 'std::runtime_error'
what(): locale::facet::_S_create_c_locale name not valid

A quick look on Google quickly turned up that I was not the first person with this issue and that the solution was to check my default locale was correctly set in the default file:


Then run the locale-gen command which will correct everything and setup your required locale so you can run MongoDB.  FAIL..

What took me the next hour to find and is not in the ubuntu help is that locale-gen takes a parameter which is the locale to be installed, so you need to run (for my system):

locale-gen en_GB.UTF-8

Which I found on the lovely site at

David Henry / Developer / Reevoo 

The Daemons are Dead (long live the Daemons)

We have several processes at Reevoo that need to be run round the clock. They’re pretty long running but variable, so aren’t really suitable for running with cron. The solution to this is to background them, but how do we run background tasks that need access to our Rails applications model?

We settled on the Daemons gem, a gem originally hosted on rubyforge that uses Process::fork to fork your daemon code in the background and close all open file descriptors, making Ruby code behaves as a standard UNIX daemon. Unfortunately the Daemons gem is pretty old now (it doesn’t look like it’s been updated since March 2008) and is missing a few essential features.

Firstly, we found that in certain situations the daemon won’t respond to a standard SIGTERM() this makes managing stuff with most monitoring systems a real pain. It  can also cause deployment headaches when you have to manually kill -9 various processes and clear up their log files on every code update. Sysadmin headache!

This problem has thankfully been solved by the use of RapLeafs’ daemon_extension code which is basically a bundle of hacks to kill -9 a daemon that refuses to die after a certain timeout period. This isn’t perfect by any stretch of the imagination, but from a pragmatists point of view: It’ll do!

The other main problem is the configurability of the log and pid file directories. Previously you could only create your daemon with one directory configured and everything would live there. We like to store our applications pid files in shared memory (/dev/shm on our RHEL and CentOS boxes). This is basically a paranoia check to ensure that if any runaway code ever forces the machine to reboot, we have no stale pid file issues and Monit can start all the applications back up with minimal sysadmin intervention. Obviously storing daemon logs in shared memory is not optimal.

And so to the point of this article: I’ve patched the Daemons library with the pid directory fix and rolled in the daemon_extensions functionality (bear in mind this has only been tested on RedHat and Mac OS X) into my Github. Which you can grab here.

Fork away! And please send me pull requests :) let’s keep the daemons gem functionality going strong.

Problems with Cookie testing in Rails 2.3

Pre Rails 2.3, when you wanted to set the value of a cookie in a test you had to:

def test_should_set_user_name 
  #set cookie value
  @request.cookies['visitor_name'] ='visitor_name', 'Dave')

  post :login, :name => 'Dan'
  assert_equal 'Dan', cookies['visitor_name'].value

But in 2.3 you can forget about all that CGI::Cookie crap and concentrate on the sweet code you actually want to test:

def test_should_set_user_name 
  #set cookie value
  @request.cookies['visitor_name'] = 'Dave'

  post :login, :name => 'Dan'
  assert_equal 'Dan', cookies['visitor_name'].value


The problem comes when you want to ensure that ‘Dave’ has been logged out. Consider this test:

def test_should_log_user_out
  @request.cookies['visitor_name'] = 'Dave'
  post :logout assert_nil cookies['visitor_name']

This test always passes! No matter whether the cookie is removed in the logout action or not.

The solution

'cookies' is defined in ActionController::TestProcess as:

def cookies

When you ask for cookies in your test, what you’re actually getting is @response.cookies which doesn’t include the values of the cookies set in the request. This means that cookies[‘visitor_name’], as far as the test is concerned, is never set, making it pass incorrectly.

I’ve written a simple patch which has recently been merged in to Rails. It merges the @response cookie with the @request cookie and returns that, rather than just returning the @response.cookie. So ActionController::TestProcess becomes:

def cookies

If you’re interested you can find the patch here.

Happy testing!

Copying a VM between Xen hosts

We recently moved a Xen virtual machine from one Xen host to another. The process involves simply copying across its disk partition (in this case an LVM2 partition using the device mapper) and copying across its config file:

1) Copy across the VM partition device.
On target host:

lvcreate -L10G -n vm1-disk xen nc -l 12345 | dd of=/dev/mapper/xen-vm1--disk conv=nocreat bs=16065b 

On source host:

xm shutdown vm1 dd if=/dev/mapper/vm1-disk bs=16065b | nc xen2 12345 

2) Copy across the VM config file.

scp /etc/xen/vm1.cfg root@xen2:/etc/xen/ 

Some further explanation is probably needed for the first step. Creating the logical volume on the target host in advance ensures that the partition, once copied across, remains a block device rather than as an image file. Netcat (nc) provides a fast mechanism to transfer the partition over the ether (use SSH if you’re sensitive about your data). As for the flags to dd, the block size (bs) is set to 16065 bytes, the number of sectors in a cylinder so I’m told (it worked for me), and the nocreat flag tells dd not to overwrite the block device.

Note: remember to shutdown the VM first! This is an offline copy. If you want to minimise downtime I’m sure an LVM snapshot would work too.

Testing Apache and mod_rewrite using Test::Unit

Here at Reevoo we (like many others) use Apache as our webserver of choice and with this comes the venerable mod_rewrite.

Mod_rewrite can be used for a lot more than just redirecting pages though, you can use it for forward and reverse proxying, redirection and url rewriting based on various factors such as the HTTP host or request uri.  However there are myriad ways in which to shoot yourself in the foot!

We love testing, so when an article about testing mod_rewrite rules using Test::Unit by the guys at Viget Labs popped up in my feed reader I quickly popped round to take a look.

Using the redirect tester you can easily define shoulda style tests by doing something like this

class ReevooRedirectTest < HTTPRedirectTest
  self.domain = ''

  should_redirect '/decidewhattobuy/blog', :to => '/decidewhattobuy', :permanant => true 
  should_redirect '/blog', :to => '/decidewhattobuy'

This is very cool and makes working with mod_rewrite much less painful than it can be!

The original code is a series of gists hosted on the vigetlabs github page and to make them easier to use and manage I packaged it up as a gem, which you can install as follows:

sudo gem sources -a &&
sudo gem install shadowaspect-http_redirect_test

and use it in your code like this:

require 'http_redirect_test'

have fun!

detenc is a fast character encoding detector for Western European text. It can determine whether a file is encoded in US-ASCII, UTF-8, ISO-8859-15,WINDOWS-1252, or something else. It can distinguish ISO-8859-15 and WINDOWS-1252 where there is enough information: this means that Euro signs are handled correctly.

The program was written to help normalise the encoding of very large data feeds(of the order of several gigabytes) at Reevoo. It uses very little memory and can determine the encoding of a two-gigabyte file in under a minute.

We process a lot of data feeds from retailers here at Reevoo. If we’re lucky, we get to specify the format. Often, though, we have to make do with feeds that are already available. The quality of these can be variable, which means that we need to be liberal in what we accept—but not so liberal that we start importing bogus data.

One of the significant variables is character encoding. This is a poorly understood topic in general, and our experience reflects this. We get feeds in:

As an aside, ISO 8859-1 is also a possibility. However, given that it doesn’t include the Euro sign, we can reasonably assume that any feeds we receive today are likely to be in ISO 8859-15 (which is very similar).

We need to turn everything into the canonical encoding — UTF-8 — before we start processing. Up until recently, we’d been using iconv for this, attempting each encoding in turn, and falling back to the next on failure. The naive detector loaded the file into memory, fed it through iconv, and wrote it back out. This didn’t work too well on big feeds — and by ‘big’ I mean 2 GB+. Working line by line over 30 million lines was still not good enough.

So I wrote a small C program to do the job. Detecting ASCII is easy: the high bit is never set.UTF-8 is a little harder, but can be done very reliably thanks to the self-synchronising characteristics of its byte sequences. Windows 1252 and ISO 8859-15 have a significant overlap, meaning that text may be in both; in this case, the program selects ISO 8859-15. However, a text that uses a byte value defined in one but not the other can only be in one encoding. Finally, a text may include byte values outside any of these ranges, in which case it’s unknown.

The program can scan a 2GB file in under a minute, which is a big improvement, and certainly good enough. It uses a few hundred kilobytes of memory, making it about 10,000 times better than the original naive implementation! It also features what I consider to be a legitimate use of gotoin the UTF-8 validating state machine.

I’ve uploaded the code to GitHub

Grab it, build it, fork it — I hope it’s useful to someone else. It may be a good start for detecting among common encodings used in other locales.

Ceiling cat is watching us!


bybusy mod accepted by Apache

In a previous article Joel spoke about the problems we were having with our load balancing between Apache and Mongrel and his bybusy mod that attempts to solve the problem.

His patch has now been accepted by Apache and should hopefully make it into a future release.