![]() Join Up! 96811 members and counting! |
|
|||
Making PHP Applications Cache-Friendly
Klaus A. Brunner
I'm running a public web-based forum that is read frequently
(about 10,000 hits per day), but gets relatively few postings (in the
range of 20 to 60 per day). The board software is Phorum, a nice open-source PHP application. Although the site has
other popular pages to offer, Phorum's read.php file is the
clear number one in my Apache hit and download volume statistics.
Most of these hits are apparently caused by users pressing "reload"
to check for updates every now and then.
With the rather small number of new postings, it is obvious that Phorum and its
underlying DBMS (in our case, PostgreSQL) have to repeatedly
generate lots of identical responses to identical queries. This is
a waste of bandwidth and server load and makes browsing the forum
appear slower than necessary, particularly for users with slow
connections. It also renders the caching efforts of proxy servers
virtually useless.
One approach to minimising redundant transmission of data is the
use of Last-Modified and If-Modified-Since headers as defined in HTTP/1.1.
In this scheme, each object returned by the webserver carries a date
of last modification (a.k.a. "validator"). A user agent or proxy
cache can store this value and, upon the next reload of the same
object, issue a conditional GET query with the Last-Modified-Since
header set. The webserver will then use this header to decide
whether the client's copy of the object is still "fresh" (as recent
as the data on the server) or "stale" (older than the data on the
server). If it is fresh, there is no need to send the object again,
so the server responds with a brief "304 Not Modified" message
instead.
Modern webservers and user agents (e.g., Apache/1.3,
Netscape Navigator 4.x and above, Internet Explorer) fully support
this technique. Apache automatically handles If-Modified-Since
requests for all static objects by default.
In the case of dynamic content as generated by PHP, we have to
take care of these things manually. We need to return a meaningful
Last-Modified header and handle If-Modified-Since requests so that
the user agent gets fresh data if and only if necessary. For
Phorum, this means that we have to keep track of database updates.
If the database has not changed since the client's last request, we
can simply return 304 without bothering the DBMS at all.
The basic approach I am using here is to touch a zero-length file
whenever the database is updated. The file's modification time will
then serve as the Last-Modified date. This is very simple to
implement, but rather primitive as it does not differentiate
between forums: if something is posted in forum X, all
forums in the same database are considered "updated". That's
clearly not very effective when you have lots of posting activity
in more than one or two forums.
A finer-grained scheme -- perhaps
down to tracking individual threads -- would be more appropriate in
that case. Additionally, there are some issues with Phorum's use
of cookies to flag unread messages; the workaround used here is
to allow a full reload periodically.
Please note that the following code is not release quality.
It works, but it's rather simplistic and should be considered a
"proof of concept". I simply patched Phorum
3.2.11 code where it seemed necessary to get quick results. My
changes are in bold print.
common.php:
include/header.php:
db/postgresql.php: (virtually identical for other databases)
Bottom Line
If you have an efficient way of tracking the freshness of your
data, implementing proper Last-Modified/If-Modified-Since behaviour
for PHP applications is very simple. It has clear benefits for both
server and client side, and usually no drawbacks. Considering that, it is
surprising how rarely it seems to be used by PHP authors.
Of course, this isn't all there is to creating cache-friendly PHP scripts. For further information, check Mark Nottingham's Caching Tutorial for Web Authors and Webmasters.
--Klaus A. Brunner
|