Figuring out where your hits are coming from and which
pages are being viewed is not too difficult if you use a
good log analyzer like Analog or something similar.
pages are being viewed is not too difficult if you use a
good log analyzer like Analog or something similar.
But if you want to pull up reports on the fly at any time,
you need to take a different route.
you need to take a different route.
For about six months, I was importing my Apache logs into
an SQL database and then running ad-hoc queries against that.
an SQL database and then running ad-hoc queries against that.
Unfortunately, that requires me to go out and manually grab
those log files, tweak the format a bit, then import. I wanted
something better – something that was continually updated,
flexible enough to simultaneously support multiple sites,
and multiple actions (page view, ad click, click-in, click-out,
plus more if needed).
those log files, tweak the format a bit, then import. I wanted
something better – something that was continually updated,
flexible enough to simultaneously support multiple sites,
and multiple actions (page view, ad click, click-in, click-out,
plus more if needed).
I sat down and came up with the following table:
create table tbl_activity_log (
fld_date int, //ex. 19990101
fld_hours text, //Which hour of the day?
fld_remote_ip text, //IP of visitor
fld_action int, //page view, click, etc
fld_special text, //which page or category?
fld_affil_num int //unique for each web site
);
Someone who really knows what they’re doing would have used a
timestamp field rather than int fld_date, but this works
for me. You could also add another column to hold the $PATH_INFO,
but I decided the Special column gives me what I want.
timestamp field rather than int fld_date, but this works
for me. You could also add another column to hold the $PATH_INFO,
but I decided the Special column gives me what I want.
I also defined these indexes, as I knew I would be querying on
any of these fields:
any of these fields:
INDEXES
idx_logger_date
idx_logger_hour
idx_logger_ip
idx_logger_action
idx_logger_special
idx_logger_affil_num
The next step was getting the info into the tables. It would be
a perfect world if all pages were served through PHP and all of your
various web sites existed on one box. Unfortunately, I have an array
of servers scattered all over the country, and I want to collect this
info from every server for every page view 24 hours a day.
a perfect world if all pages were served through PHP and all of your
various web sites existed on one box. Unfortunately, I have an array
of servers scattered all over the country, and I want to collect this
info from every server for every page view 24 hours a day.
So that required me to use the 1×1 pixel GIF trick. I have a GIF
on every page that looks like this:
on every page that looks like this:
<IMG SRC="http://www.yourserver.com/util/gif11.php3?c=4&s=phpbuildercom&b=77" height=1 width=1>
gif11.php3 is a simple script that resides on my central server.
I have included the source on the next page.
Since the gif is on each page, and is forced to load due to the random
number at the end (the b=xxxx), a request is sent back to the central
server for each and every page view.
I have included the source on the next page.
Since the gif is on each page, and is forced to load due to the random
number at the end (the b=xxxx), a request is sent back to the central
server for each and every page view.
If all of your pages and the database reside on one server, you don’t
need to use the GIF trick – you can insert the logging code into the
header of your page.
need to use the GIF trick – you can insert the logging code into the
header of your page.
The same logging system can be used to track click-thrus on ad banners:
<A HREF="http://www.yourserver.com/util/adclick.php3?c=4&goto= * >
Just replace * with the URL you want to redirect to. Again here, the
$c variable should be changed for each web site.
$c variable should be changed for each web site.