bybusy mod accepted by Apache
Posted by lukeredpath on August 19th, 2008
In a previous article Joel spoke about the problems we were having with our load balancing between Apache and Mongrel and his bybusy mod that attempts to solve the problem.
His patch has now been accepted by Apache and should hopefully make it into a future release.
Collecting exceptions asynchronously using Beanstalkd and Hoptoad
Posted by lukeredpath on August 15th, 2008

Whilst there are already a number of options for collecting exceptions from your production Rails apps (such as exception_notifier), none of them are designed for collecting exceptions asynchronously. This means you are introducing a potential bottleneck in your request loop should any part of your exception collection process (e.g. sending the exception to a mail server) slow down for any reason.
At Reevoo, we already use beanstalkd for a number of applications so it seemed like a perfect fit for collecting our exceptions. The end result of this was our exception_messaging plugin which builds on top of our beanstalk_messaging plugin allowing you to configure your Rails app to send exceptions (along with supporting request, environment and session data) to a beanstalkd queue.
We’ve had this running in production for a while now but we weren’t really doing anything useful with the data. We started work on a basic Rails application that consumed the messages from the queue and stored them in a database with the idea being that we could build some basic reporting tools on top of this data, but what with work being work, we never found the time to do this properly.
Enter Hoptoad
Hoptoad is a third-party service that you can use to post your exception data to. It is exactly the kind of app we would have liked to have built. Fortunately for us, somebody has already done the hard work. It collects and aggregates your exception data and presents it in a easy to use interface. Oh, and currently, it’s free.
Hoptoad provide a simple Rails plugin that you can drop in to your Rails project for synchronous integration with their service. They also publish a RESTful API for retrieving and posting data.
Out of the box, the plugin is probably fine for most users. For high-traffic websites where performance is vital and/or you want to avoid creating a dependency on a third-party service, you want a more robust solution. This is where Beanstalkd comes in.
Rather than re-inventing the wheel, I wanted to reuse the existing plugin code to send the exceptions collected from our queue; however, the parts of the plugin code that I wanted to use (the code that wrapped around their API and submitted exception data) was tightly coupled to the Rails integration and configuration process. Fortunately, the code was hosted on GitHub so I created a fork and began work on refactoring their plugin into loosely coupled components. At the time of writing, much of the code has been refactored and can be used in a standalone manner although there is still some work left to do. I’m hoping it will be merged into the master repository by the guys at Hoptoad (who have been very receptive to my enquiries and suggestions). You can grab the refactored plugin from a branch in my GitHub fork.
Consuming queued exceptions and sending them to Hoptoad
Once the refactoring work was done, sending our collected exception data was fairly trivial. I had to make a few changes to our exception_messaging plugin to make sure we were collecting all of the data we needed (simplifying a lot of the code in the process) but once we were collecting exception data in the correct format, all that was left to do was to write a simple Ruby script that used the updated plugin and a Beanstalk::Poller (courtesy of our beanstalk_messaging plugin) to process the messages:
require 'beanstalk_configuration' # sets up $queue_manager
require 'hoptoad/standalone'
Hoptoad.configure do |config|
config.api_key = 'your_projects_hoptoad_api_key'
end
notifier = Hoptoad::Notifier.from_config(Hoptoad.config)
Beanstalk::QueuePoller.new($queue_manager).poll(:the_exceptions_queue) do |message|
notifier.notify(message.ybody)
end
For more information on how to poll Beanstalk queues, see the instructions for the beanstalk_messaging plugin. To configure your app to use beanstalkd for collecting exceptions, see the instructions for the exception_messaging plugin.
Any problems or questions? Leave a comment.
Fixing uneven load balancing between Apache and Mongrel for Ruby on Rails applications
Posted by joelgluth on July 30th, 2008
Our site has some pages that can take a while to return. The fact of them is probably, in the short to medium term, inevitable, and in general performance is good. Anecdotally though, users and testers have been experiencing poor performance on pages that we know should render quickly or even instantly (because they’re cached). What is going on?
Setup: Apache, mongrel, RoR
Our main site setup is extremely straightforward: We use Apache and mod_proxy_balancer to field incoming requests, which are then farmed out to a group of Mongrel instances that run our Ruby on Rails application.
Mongrel queues requests and is effectively single-threaded with reference to Rails
Mongrel is multi-threaded, but it locks a mutex as soon as it starts Rails processing. The result for most of us is that it processes one request at a time, potentially with a queue of pending requests on the front. If a request is taking a long time, anything queued up behind it, no matter how trivial, waits. Our pack of Mongrels should be big enough to avoid this situation with the sorts of traffic we get presently, but is it?
Apache’s load balancer
Apache’s mod_proxy comes with two different load balancing methods: “byrequest” (the default) and “bytraffic”. Both of these are historical balancers: they will ensure that cumulative load is distrubuted evenly between workers, but they don’t particularly care what the current state of a given worker might be. It seems entirely possible that Apache could assign a request hitting a poorly-performed page to worker A, sprinkle short-lived requests to its friends B and C, then assign another request to A because it’s A’s turn – even though A is still busy and B and C are both idle. And so on.
This would certainly explain “fast” pages sometimes coming back really slowly, and the overall server load wouldn’t even have to be very high.
What are our mongrels actually doing?
Mongrel’s logging and debugging output are not terribly helpful out of the box, at least not for finding out what’s going on with queued requests. Fortunately, the truly awesome proctitle will tell us exactly what we want to know, right there in the output of ps. It’s almost as if other people have had this problem before…
Drop the plugin into Rails, restart the mongrels, and let’s spectate a while.
joel@hiscomputer: $ watch -n 0 'ps ax | grep mongrel' # real-time and everything :) 17209 ? R 18:22 mongrel_rails [9001/2/1524]: handling 127.0.0.1: GET /reviews/mpn/take_2/tt45920 17213 ? R 14:25 mongrel_rails [9002/1/1321]: handling 127.0.0.1: GET /reviews/mpn/humax/lu23_td2 17217 ? R 14:50 mongrel_rails [9003/1/1190]: handling 127.0.0.1: GET /reviews/mpn/navman/s30 17221 ? S 12:11 mongrel_rails [9005/0/1032]: idle
Brilliant – the interesting bits on the first line are the ‘2’, telling us that the Mongrel on port 9001 has two requests to serve (one active and one queued), and that it’s currently serving a reviews page for Grand Theft Auto IV. Already we can see that Apache has given the 9001 Mongrel a request when it was doing something else, even though the 9005 Mongrel could have served it faster.
So, our hypothetical scenario is a real one, but how bad does it get? And how much of it is down to unhelpful load-balancing?
Digression: Siege
We use a number of different tools to automate web traffic for testing purposes. Our current favourite back-of-the-napkin number-generator is Siege. Siege will take a list of URLs and happily blast away at them at a rate and concurrency, and for a duration, that can be tweaked in arbitrary ways to explore different aspects of server performance. In this case we want to see a wide variety of requests, and to squish them temporally to see just how unevenly individual Mongrels end up getting loaded. Since we suspect that live users are hitting this problem, we can just take the request URLs from a day’s worth of our live Apache logs. Something like:
joel@hisproductionserver$ cut -d -f 7 /var/log/httpd/access.log | perl -p -e 's{^}{http://testserver}' > urls.txt
On my current test box, I have eight Mongrels running. I start with eight concurrent siege users in my .siegerc file, a two-minute siege duration, the urls.txt file generate a moment ago, and with proctitle I see this:
13776 ? Rl 0:17 mongrel_rails [9000/4/17]: handling 127.0.0.1: GET /reviews/mpn/stoves/600sidlm 13779 ? Sl 0:04 mongrel_rails [9001/0/20]: idle 13782 ? Sl 0:05 mongrel_rails [9002/1/19]: handling 127.0.0.1: GET /reviews/mpn/sharp/lc20s5ebk 13785 ? Rl 0:17 mongrel_rails [9003/1/18]: handling 127.0.0.1: GET /reviews/mpn/stoves/600sidlm 13789 ? Rl 0:15 mongrel_rails [9004/3/17]: handling 127.0.0.1: GET /reviews/mpn/sharp/lc20s5ewh 13793 ? Rl 0:13 mongrel_rails [9005/2/17]: handling 127.0.0.1: GET /browse/product_type/routers 13796 ? Sl 0:12 mongrel_rails [9006/0/18]: idle 13800 ? Sl 0:12 mongrel_rails [9007/0/18]: idle
We can see that Apache is distributing the load fairly evenly over time (the ‘17’ in ‘[9000/4/17]’), but is doing badly at making sure that load is distributed evenly at any given moment. The 9000 server has 4 pending requests, while 9001, 9006 and 9007 are sitting idle. There are a fairly low number of concurrent users, and already half of them are having an unnecessarily bad time!
Monit as a way of taking snapshots
Like a lot of people, we use monit to make sure our servers are up, and to kick them when they’re down so they get up again. It can be configured to kill and restart Mongrel processes that take more than some number of seconds to respond, but we can repurpose it to take snapshots of ps output and append them to a file instead using a Very Small Shell Script – the camera instead of the elephant gun, if you like. proctitle is just so handy. Each of our Mongrels has a .monitrc entry that looks like this:
if failed port 9001 protocol http # check for response
with timeout 5 seconds
then exec /home/joel/bin/slow_request.sh 9001
where, if a server takes more than five seconds to respond, slow_requests.sh simply locates the process ID of the Mongrel running on port 9001, and records the output of ps for that process. The results were very suggestive – right at the top, we see lines like this:
17213 ? S 27:27 mongrel_rails [9002/4/2451]: handling 127.0.0.1: GET /reviews/mpn/henry/numatic 1148 ? S 2:48 mongrel_rails [9000/10/240]: handling 127.0.0.1: GET /search 1148 ? S 3:32 mongrel_rails [9000/5/314]: handling 127.0.0.1: GET /
and even
2955 ? S 11:41 mongrel_rails [9001/17/1010]: handling 127.0.0.1: GET /browse/product_type/Tents
At one point, a single long-running process had managed to force sixteen other site vistors’ requests to wait! There are quite a few lines (after the server had been monitoring for a while) that were unlike these examples – specifically, they each showed only a single pending request, and it wasn’t necessarily for a slow page. Often, in fact, it would be a request for the front page, which should be quick in the first instance, and cached to within an inch of its life in the second. Performance appears to be degrading over time. We’ll look at that later, because a solution to our uneven load balancing problem has suggested itself.
bybusy mod
Out of the box, Apache offers the two load balancing methods I noted earlier. Being Apache, you can bet that they will have made it fairly easy to add more. And indeed, this turns out to be the case. We added a third “bybusy” lbmethod, able to be configured in the same way as the others. It was initially very simple – for each proxy worker thread in the web server, increment a “busy” counter when assigning a request to that worker. In the post-request hook, decrement the counter again. When choosing a worker, simply pick the worker with the lowest “busy” value.
Simple to say, and thanks to the general cleanliness and friendliness of the Apache code base, simple to do. Attacking it with Siege resulted in exactly what one would hope for: Mongrels absorbed load evenly even when concurrent requests greatly outnumbered the available workers, and most importantly there was never a case where an incoming request was assigned to a Mongrel that was already doing something while another sat idle.
Refinement
This is more of a nicety than anything else, but it would be nice to know that grabbing a random Mongrel’s log for period X will give me a representative sample of what’s been going on across all of them. As it stands, the bybusy method will, over time, favour Mongrels higher up the load-balancer’s list. A little reflection will show that if traffic is sparse, then this inequity gets much worse.
We ended up refining bybusy a little, by using the “byrequest” method as a tie-breaker between workers with identical “busy” values (most frequently, when they are idle). So we get the moment-to-moment balancing we wanted to start with, but also the cumulative balancing that we had originally and which leads to nicely-balanced log files. A small thing, I know, and Mongrels don’t get tired, but it seemed like a good idea.
UPDATE: This has been submitted as a patch to httpd, for source and progress go to Apache Bugzilla.
SimpleConfig on GitHub
Posted by lukeredpath on July 3rd, 2008
For all you Git fanatics out there, I’ve just pushed a copy of the SimpleConfig plugin to GitHub, where you can fork and contribute to your hearts content.
The official repository will always be the public Subversion mirror (once we’ve got it up and running!) and my git fork may contain experimental features or enhancements that haven’t made it into production use but it gives you the chance to hack away.
Welcome to Reevoo Labs
Posted by kylemcginn on July 3rd, 2008
Welcome to the re-launch of labs.reevoo.com, a place where we (the team at Reevoo) try our best to contribute a little something back to the community.
To kick things off, we’ve brought across some of our existing projects from the previous labs site, but also added a few new things that we felt were ready for prime-time:
- Mocha – the Ruby mocking/stubbing library based on JMock syntax, used by Rails core
- NEW! – Beanstalk Messaging plugin – a Rails plugin for managing, polling and communicating with the excellent Beanstalkd messaging queue
- NEW! – Simple Config – a Rails plugin that makes it easy to set up application-wide configuration/settings for each of your development environments, separate to Rails’ own environment files, and provides an object-oriented way of accessing those settings throughout your application.
- NEW! – Exception Messaging plugin – This plugin builds upon the Beanstalk Messaging plugin and allows you to forward your application exceptions to a beanstalkd queue for offline processing.
Please watch this space as we have a few more things on their way “very soon”TM that were nearly ready for labs. A couple of them are a bit more Rails-infrastructure related than the current set of projects we’ve pushed live today.
Kyle McGinn / CTO / Reevoo

