#native_company# #native_desc#
#native_cta#

Best Practices: Database Indexing Page 3

By Tim Perdue
on January 8, 2001

CREATE INDEX mytable_categoryid_userid_adddate
	ON mytable (category_id,user_id,adddate);

NOTICE:  identifier "mytable_categoryid_userid_adddate" 
	will be truncated to "mytable_categoryid_userid_addda"
CREATE

EXPLAIN SELECT * FROM mytable
	WHERE category_id=1 AND user_id=2
	ORDER BY adddate DESC;

NOTICE:  QUERY PLAN:

Sort  (cost=2.03..2.03 rows=1 width=16)
  ->  Index Scan using mytable_categoryid_userid_addda 
	on mytable  (cost=0.00..2.02 rows=1 width=16)

EXPLAIN
Now if you look at the explain output, it did something that we were
dreading. We don’t want a sort in there. I was hoping that the database
would be smart enough to do the query right without any extra hints, but
I guess I was overly optimistic.
To get around the sort step, we don’t need any extra indices, but we
do need to modify the query a bit. I’m going to give postgres an extra
clue about what I want it to do – I’m going to add the criteria fields to the
ORDER BY clause. This should technically not be necessary, because there is no
sorting to be done on those two fields, but it gives postgres the extra
kick in the pants that it needs sometimes.
EXPLAIN SELECT * FROM mytable 
	WHERE category_id=1 AND user_id=2
	ORDER BY category_id DESC,user_id DESC,adddate DESC;

NOTICE:  QUERY PLAN:

Index Scan Backward using 
	mytable_categoryid_userid_addda on mytable  
	(cost=0.00..2.02 rows=1 width=16)

EXPLAIN
There, now that’s perfect. Postgres now uses the index I wanted it to,
plus it’s smart enough to read the index backwards to avoid any sorting
at all. This may sound like a petty detail, but I guarantee you will
appreciate it when your database gets huge and/or you get millions of
page views per day.
Things can get fairly complex when you are joining multiple tables together,
especially if you are using selection criteria where
the fields are in more than one table. Generally, I try to avoid this as
much as possible, because it may mean the database has to join everything
together, then go through and eliminate rows that don’t match. That
can be incredibly expensive if you’re not careful.
In a situation like that, you should use the indexing strategy above
for each of the joined tables and then verify with an EXPLAIN command
that your indices are being used. If they are, you should be OK. If not,
you may want to create temp tables which are pre-joined and indexed properly.
An hourly cron job could be used to populate this temp table.
Well now you should have the fundamentals of database indexing down.
Please post your questions, comments, successes and failures.
–Tim