Improving the Performance of Drupal's Initial Page Request

When troubleshooting performance problems with any web site, I first look at the problem to try and break it down into pieces.  Where is the biggest problem?  One tool I use to do that is Firebug.  If you don't know much about Firebug, read my article Using Firebug to Figure out Why Your Website is Running Slow.

The first attempt to break down performance problems in to pieces results in finding out if the problem is with the initial page request, or the subsequent assets that are loaded.  See this Firebug screenshot of the Net panel:

When I talk about the "initial page request", I'm talking about the very first line, the one.  That first request is requesting the entire HTML source of the page.  That entire source needs to be downloaded, then the web browser knows to make a bunch more requests for assets.  This would be all the lines from the 2nd one on, includes CSS, Javascript, Images.

This article will talk about improving the performance of this intial page request, I will cover improving the subsequent asset loading in an upcoming blog post.

I will list out what to look at in the order of likelihood of being the cause of your problem in a typical Drupal site.  I typically follow a strategy like this to save time and getting right to the problem.

its important to note the time above from the Firebug output.  It is 516ms.  When looking at these options, you want to be able to figure out how to break down that time into different components, and also remember what it was initially so you can change one thing, test, repeat.

Too Many Database Queries Needed to Generate the Page

To see how many database queries are needed, and how long they take, use the Devel module.  Go to its configuration at Configuration | Development | Devel Settings.  Select the checkbox for "Display Query log", and also select to Sort By duration, it is more useful than sorting by source.  Save those settings, then visit the page that is slow.  At the bottom of the page, you will see some extra output that looks like this:

There are a few important items here.  You see the overal number of queries, the total time it took them to execute, and for each query you see the time and number of executions.

We want this information not only to look at it and see if any queries are taking too long to run, but we also want to use it as a base for our benchmarking when we start making changes.

There are a few ways we can reduce the number and overall time of access to the database.

Page and Block Caching

What page caching does is that instead of using all those queries to each get information used in making the web page, the contents of the web page are stored in a separate database cache table so that it can be recalled quicker.  If you have 10 people visiting the site from different computers, Drupal first looks into the database cache table to see if the page is there, if it is, it just gives them the page.  Think of saving the output of 50 separate queries so that is accessible with a single query.  You obviously are reducing the SQL queries required by a lot.  What the page cache table actually stores is HTML content.

One gotcha with this Page Caching is that it only works to optimize the page load time for Anonymous users.  This is because when you are logged in, you might have blocks that show up on the page that are customized for you, if it served everybody the same page, they would see your customized information (think of My Recent Posts), so Drupal does not use the Page Cache for Authenticated users automatically.  This allows you to turn Page Caching on and still get the benefit for Anonymous user page load times, but does not break the site for Authenticated users.

There is another caching setting for "Cache blocks" that helps to optimize the site for Authenticated users, it fills in the gaps by caching those blocks that might be different per user.  Takes up more space in the cache, but it is better than nothing.

To enable Page Caching, you go to Configuration | Development and select the checkbox next to "Cache pages for anonymous users".  Also select the "Cache blocks" checkbox, this is what works for Authenticated users, blocks are cached, which means they can be assembled into an entire page easier.  See screenshot:

To show you how big of an impact this can have, take a look at a Firebug output before turning on caching and viewing a page as an Anonymous user:

And after turning on caching:

You can see it shows the page loading in about 1/3 of the time.  Remember when benchmarking that the first request to a page will fill the cache, the 2nd request will use the cache.

To take this one step further, you can change out your Drupal caching engine to not store to a database table, but to instead store it in memory using Memcached.  So if you have plenty of RAM on your server, you can eliminate a LOT of database queries and operate even faster.

Remember what caching is though, a whole page is stored in the database and always served up.  What if you actually change that page, like you make a new blog post and want your home page to show it.  With page caching on, it will not show up until the cache is cleared.  By default, with Drupal, the cache is cleared every time cron is run.  So as always, its a tradeoff.  You can clear the cache manually each time you post something new, or if its OK that a new post doesn't show up for a few hours (based on your Drupal cron period setting), then you can take advantage of some serious performance improvements.

Some of the extra settings on the Performance page also deal with this, there are settings for Minimum Cache Lifetime, and Expiration of Cached pages.  See this screenshot:

So you can even further fine tune how long you would like your cached entries to live.  Just be realistic about the need to show the latest content.  If you are posting a few times a day, can do with hours, if you only post a few times a week, a day is just fine.  But also if you want to post and then push links out to social sites about your content, you might want to clear cache manually.

As a final note, along with page and block caching, if you have a lot of Drupal Views that have blocks or pages showing slow performance, you have to make a special trip into the Views settings and change the cache settings there.  You can fine tune the cache settings per View, and even separately for Blocks and Pages within a single View.  Hopefully if you have a Views block or page causing you performance issues, you see that in the Devel query log and can pinpoint it back to which View it is coming from.

More Caching, MySQL Query Caching

A quick, easy thing to do to reduce the burden of lots of really small, simple database queries is to use the MySQL Query Cache.  This sets aside a portion of memory to save the output of a database query so that it can be looked up if the output of the query has not changed since the last time you issued the query.  Basically, if a particular database table has not had any changes to it, then any queries on that table can go in the query cache and stay there.  As soon as a change happens to a table, all queries on that table that are in the query cache are invalidated.

Long story short, just give a small amount of memory to it, like 8M or 16M.  By default, it is not enabled at all in the MySQL distributions that I use most, which is why its worthwhile to at least turn it on.  Because of its low-level interaction with the database, it does not require any application changes.

Is your Drupal Page Using Too Much Memory?

Another slowdown possibility once you have taken care of all the database stuff is that you just have too much traffic, and your Apache web server is configured to accept a lot of connections, but your web server does not have enough memory to serve all those connections.  Or it isn't configured to handle the connections you are getting, and they have to wait until another connection frees up.

Both situations cause web performance to grind to a halt, either from running out of RAM, which causes Virtual Memory disk I/O in the first case, or in the second case your requests have to pause until other requests are handled.

Another setting in the Drupal devel module helps you figure out how much memory it takes to load a page.  Select the checkbox for "Display memory usage":

Then go to your slow page and look at the bottom for output that looks like this:

Your page will likely have a lot more, I'm showing this memory usage from a demo site that does not have much being loaded on the page.

The setting we look at in Apache to see how many connections we can handle is the MaxClients setting in apache.conf.  If you look in your Apache error log, you can see if you have been hitting this limit, it will say that you have reached your MaxClients limit.  If you look at your utilized swap space using the Linux command "top" and see that its increasing, or is non-zero, inspect the processes and see how much RAM they are using.  The solution to these issues is increase your MaxClients if you are running into the connection problem.  If you are hitting max RAM and still want people to use your web site, get a bigger web server with more RAM.

You can try and make Drupal use less memory by disabling modules, but in almost all cases, this is not an option as it would reduce functionality of your site.  Just pay a little more for web hosting.  Another pretty easy thing to do is to free RAM up on your web server by moving your database off onto another server, so you free up any RAM you had allocated to MySQL.  Its always a good idea to do this anyways, and is typically one of the first things to look at when scaling from the bare bones web server + database on a single server.

Too Many PHP Files Being Interpreted

PHP is an interpreted language, which means the files have to be reduced down into opcodes in order to actually be executed by the server.  One of the things you gain by using Drupal is flexibility, but you sacrifice that in terms of overhead of all those PHP files that need to be called to find all the hooks to run.  Luckily, there is an option out there that will store the intermediate code of PHP files in memory so that every request does not result in fetching all those PHP files off disk and interpreting the files every time.

The component is APC, the Alternative PHP Cache, sometimes called a PHP accelerator.

All you do is edit a section in your php.ini file, allocate some memory for it, and turn it on.  By turning it on, I mean use the option apc.stat = 1 that checks the timestamp of the file first to see if the file has been modified before it grabs it from the cache.  That allows a development process to work smoother so as soon as you release code, it will take the new copy.

Just set:

  • apc.enabled = 1
  • apc.shm_size = 16M
  • apc.stat = 1

I have run into some problems if your release process involves doing a svn export over the top of your code.  I still do it, but I've added in a command that clears the APC cache after I export the code so that I make sure the new files are being run.  That is just a given part of my development scripts that is an acceptable for the benefit you get for it.

While I recommend this for Drupal, I cannot recommend for Wordpress with any certainty.  I've had Wordpress sites that just white screen due to APC problems that weren't having code changed, has remained a mystery to this day, I'm not saying don't do it, just be careful and you can't say I didn't warn you.  And these were running on the same server as Drupal sites using APC that were running fine.

Not Covering All Possibilities, Just the Big Ones

So there are a lot of additional little things to look at, but in my experience, these are the most likely culprits when your initial page load speed is suffering.