Sonntag, September 02, 2007
Mittwoch, Juli 25, 2007
It has been an really long time since I last updated. No, I have not been on vacation all the time (nevertheless I have been more than once since the last post :)), but lots of work and private stuff just did not leave enough time to do any updates.
So, what's new...
Yesterday I looked around Google Video for TechTalks on the hunt for something interesting and stumbled across Performance Tuning Best Practices for MySQL. In this quite interesting - even though somewhat outdated talk - Jay mentioned again a known fact about InnoDb indexes. Internally every non-primary index on an InnoDb table is appended with the primary key fields. This is because in contrast to MyISAM where all indexes are treated the same, the primary key is a clustered index in InnoDb. See the manual page or e. g. a sample chapter on Query Optimization from MySQL, 3rd Edition by Sams Publishing. This leads to an interesting situation. Consider this (hypothetical) table:
CREATE TABLE `invoices` ( `invoicecode` char(10) NOT NULL, `customercode` char(10) NOT NULL, `email` varchar(45) NOT NULL, `duedate` char(8) NOT NULL, PRIMARY KEY (`invoicecode`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The records will be physically saved in lexical
invoicecode order. InnoDb will store all the field data directly with the primary key. Adding an index e. g. on the
customerno column will copy all the
customerno values into a different location in their natural order. For each
customerno the corresponding
invoicecode will be stored, there too. Looking records up by
customerno will first cause the secondary index to be accessed. Only then will the actual row data be accessed by a second lookup using the
invoicecode found for a
Often a single customer will have several invoices, so the database designer might come up with a combined - covering - secondary index for quicker lookups by
ALTER TABLE `invoices` ADD INDEX `idx_combined`(`customerno`, `invoicecode`);
This seems to be a good idea, especially queries by
customerno are intended to retrieve only the
invoicecodes. In such cases MySQL is able to satisfy the request by only using the index, no second access as described above is necessary, because all required data is contained in the index itself. Using the EXPLAIN command on a query like
SELECT customerno, invoicecode FROM invoices will show "index" in the "type" column (or "Using index" in the "Extra" column).
So everything is good, right? Well, it could be better: Because
is the primary key it will be appended to all other indexes. This means the
idx_combined index internally looks like this
(`customerno`, `invoicecode`, `invoicecode`) with the second
being the primary key that has been added automatically.
Usually MySQL will give you a "Duplicate Column" error when you try to include the same column more than once in an index definition, but as this is specific to InnoDb the MySQL layer above seems not to know about this.
I filled the example table with around 74.000 randomly generated records. Now explaining an example query show this:
mysql> explain select customerno, invoicecode -> from invoices -> where customerno between 1000012334 and 1050012334 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: invoices type: range possible_keys: idx_combined key: idx_combined key_len: 4 ref: NULL rows: 6110 Extra: Using where; Using index 1 row in set (0.00 sec)
Notice the "Using index" hint in the "Extra" column. This is because MySQL can take all the columns we want from the index. According to "key:" it is in fact using our composite key.
Now we try the same thing again after modifying the idx_combined by dropping the
invoicecode column from it:
mysql> ALTER TABLE `test`.`invoices` -> DROP INDEX `idx_combined`, -> ADD INDEX `idx_customerno` (`customerno`); mysql> explain select customerno, invoicecode -> from invoices -> where customerno between 1000012334 and 1050012334 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: invoices type: range possible_keys: idx_customerno key: idx_customerno key_len: 4 ref: NULL rows: 6174 Extra: Using where; Using index 1 row in set (0.00 sec)
As you can see, the new index is used, which contains only one (visible) column, but still the "Using index" hint is shown, which means no lookups to the actual data was needed to fulfill the request.
Dropping the explicit
invoicecode column from the
idx_combined index will reduce the size of an index record by about a third, i. e. from 30 to 20 bytes per record. This allows more index records to fit into a single buffer page, in effect allowing for more efficient use of your memory - the more entries per page the more efficient your buffer pool memory can be used and the better your performance can get.
I added support for detecting this kind of redundancy in MySQL Index Analyzer. While at it I also improved the looks a little, utilizing some icons from Eclipse. Analyzing the simple example above will result in this output:
There is no pre-build package so far, and I do not know when I will get to build one. But feel free to always get the latest version from SVN.