I use sessions on my site. Anyone looking at a page is instantly given a session ID and this is passed from page to page.Sessions are timed-out after 300 seconds. Now by storing this in the logs as well you suddenly have the route every user took when looking at the site. So my logging table now looks like this :
create table logging ( timestamp BIGINT, remote_ip char(15), page text, refering_page text, session_id char(20) );
By adding this one bit of data we suddenly have a whole new way of looking at the data. Before we use this data we need to understand better how it is stored.
- A user enters the site.
- They are given a session ID
- The logging table is updated ( with timestamp, remote_ip,page,refering_page and session ID)
- The user clicks a link
- A different page is displayed.
- Back to step 3
Now what I do is this, but I’m not saying it is the best way :
Select a timestamp start and end ( eg beggining of the day and end of the day)
Select the page you want to see how users got to. We will call this “pagex”
Select all session IDs from the logging table where page=pagex and timestamp>timestamp_start and timestamp<timestamp_end.
We now have a list of session IDs of people that looked at pagex.
Then for each session ID – we want to get the pages that they looked at where the timestamp is before that of them looking at pagex.
Order this information by the timestamp and that is the users route.
Select the page you want to see how users got to. We will call this “pagex”
Select all session IDs from the logging table where page=pagex and timestamp>timestamp_start and timestamp<timestamp_end.
We now have a list of session IDs of people that looked at pagex.
Then for each session ID – we want to get the pages that they looked at where the timestamp is before that of them looking at pagex.
Order this information by the timestamp and that is the users route.
This unfortunatly does give us their whole route around the site before getting to pagex. I overcome this by grouping the pages that they have looked at and ordering by the most popular in a kind of backstepping way.eg look at one page back, two pages back, three pages back etc
There are many more ways of looking at this data – I’m not sure what they all are! – but I know that they are there. By looking at this data it becomes apparent that some links you have put in place are never used. Or certain pages are accessed from other sites more than they are from your own site. One good use is to see how spiders/bots navigate your site. If used correctly you will have a better understanding of how users navigate your site, where the obvious links are and where there should be links.
Happy user following.