Every few months at tilllate we play the query optimization game. At this game I use the slow query log to find out those queries the most load on the servers.
With the queries I found I then either: optimize the query or cache the results to avoid the query.
I prefer the former because caching means data duplication. Which is not very DRY.
Optimization impossible?
There’s one type of query that’s particularly tough to optimize. It’s the “Top-N” query. Example: On the homepage of tilllate we’d like to show the 10 most recent photo reports.
All photo reports are being stored in a table photoreports
. Indexes properly set.
mysql> SELECT * FROM photoreports LIMIT 10000,5; +-----+------------+---------------------+------------+ | id | date | name | country_id | +-----+------------+---------------------+------------+ | 3376| 2006-04-21 | Einweihungsapéro Wer| 1 | | 1878| 2007-05-24 | Le Ho Fada | 30 | | 1879| 2007-05-25 | Le Ho Fada | 30 | | 1928| 2007-05-27 | All You Can Drink! V| 3 | | 9803| 2004-02-21 | Go Tonight | 3 | +-----+------------+---------------------+------------+ 5 rows in set (0.02 sec) mysql> SHOW INDEXES FROM photoreports; +------------+-------------+-------------+ | Key_name | Column_name | Cardinality | +------------+-------------+-------------+ | PRIMARY | id | 415230 | | country_id | country_id | 240 | | date | date | 7414 | +------------+-------------+-------------+ 3 rows in set (0.00 sec)
To get the 10 most recent photoreports we use the following query:
SELECT date, name FROM photoreports WHERE country_id=3 ORDER BY date DESC LIMIT 10
Explaining the select
But, oh boy, EXPLAIN SELECT shows this query is not efficient at all:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: photoreports type: ref possible_keys: country_id key: country_id key_len: 5 ref: const rows: 7551 Extra: Using where; Using filesort
Using filesort. Disk access! Even though all indexes are being properly set. But since MySQL can only use one index per table it will use the one on the country_id and leave no index left for the ordering.
Is there a way to avoid the ugly filesort? – I don’t know. Any ideas are being appreciated.
Update: 11.11.2007
Oops. The solution is actually so simple I did not even consider it *blush*. Event though I wrote about it a year ago.
Using a combined Index the WHERE-clause is using the `country_id` part of the key, the ORDER BY-Clause is using the `date` part.
mysql> alter table photoreports add index (country_id, date); Query OK, 415230 rows affected (9.92 sec) Records: 415230 Duplicates: 0 Warnings: 0 mysql> EXPLAIN SELECT date, name FROM photoreports WHERE country_id=3 ORDER BY date DESC LIMIT 10 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: photoreports type: ref possible_keys: country_id,country_id_2 key: country_id_2 key_len: 5 ref: const rows: 7551 Extra: Using where 1 row in set (0.01 sec)
I think I should call this blog entry: Never look too far for the solution. OR: Performance optimization is something I should leave up to the techies. The CTO’s job is to keep the techies motivated. 🙂
Why not using a multi-column-index over country_id AND date?
A photoreport-table doesn’t sound like its changing too often for an additional index..
Hi, did you try a multi-column index?
alter table photoreports add index (country_id, date);
Thanks guys… I just updated the entry… 🙂