Mittwoch, Juli 25, 2007

Duplicate fields in index

It has been an really long time since I last updated. No, I have not been on vacation all the time (nevertheless I have been more than once since the last post :)), but lots of work and private stuff just did not leave enough time to do any updates.

So, what's new...

Yesterday I looked around Google Video for TechTalks on the hunt for something interesting and stumbled across Performance Tuning Best Practices for MySQL. In this quite interesting - even though somewhat outdated talk - Jay mentioned again a known fact about InnoDb indexes. Internally every non-primary index on an InnoDb table is appended with the primary key fields. This is because in contrast to MyISAM where all indexes are treated the same, the primary key is a clustered index in InnoDb. See the manual page or e. g. a sample chapter on Query Optimization from MySQL, 3rd Edition by Sams Publishing. This leads to an interesting situation. Consider this (hypothetical) table:

CREATE TABLE `invoices` (
`invoicecode` char(10) NOT NULL,
`customercode` char(10) NOT NULL,
`email` varchar(45) NOT NULL,
`duedate` char(8) NOT NULL,
PRIMARY KEY  (`invoicecode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The records will be physically saved in lexical invoicecode order. InnoDb will store all the field data directly with the primary key. Adding an index e. g. on the customerno column will copy all the customerno values into a different location in their natural order. For each customerno the corresponding invoicecode will be stored, there too. Looking records up by customerno will first cause the secondary index to be accessed. Only then will the actual row data be accessed by a second lookup using the invoicecode found for a customerno.

Often a single customer will have several invoices, so the database designer might come up with a combined - covering - secondary index for quicker lookups by customerno and invoicecode:

ALTER TABLE `invoices`
  ADD INDEX `idx_combined`(`customerno`, `invoicecode`);

This seems to be a good idea, especially queries by customerno are intended to retrieve only the invoicecodes. In such cases MySQL is able to satisfy the request by only using the index, no second access as described above is necessary, because all required data is contained in the index itself. Using the EXPLAIN command on a query like SELECT customerno, invoicecode FROM invoices will show "index" in the "type" column (or "Using index" in the "Extra" column).

So everything is good, right? Well, it could be better: Because invoicecode is the primary key it will be appended to all other indexes. This means the idx_combined index internally looks like this (`customerno`, `invoicecode`, `invoicecode`) with the second `invoicecode` being the primary key that has been added automatically.

Usually MySQL will give you a "Duplicate Column" error when you try to include the same column more than once in an index definition, but as this is specific to InnoDb the MySQL layer above seems not to know about this.

I filled the example table with around 74.000 randomly generated records. Now explaining an example query show this:


mysql> explain select customerno, invoicecode
   ->   from invoices
   ->   where customerno between 1000012334 and 1050012334 \G
*************************** 1. row ***************************
         id: 1
select_type: SIMPLE
      table: invoices
       type: range
possible_keys: idx_combined
        key: idx_combined
    key_len: 4
        ref: NULL
       rows: 6110
      Extra: Using where; Using index
1 row in set (0.00 sec)

Notice the "Using index" hint in the "Extra" column. This is because MySQL can take all the columns we want from the index. According to "key:" it is in fact using our composite key.

Now we try the same thing again after modifying the idx_combined by dropping the invoicecode column from it:

mysql> ALTER TABLE `test`.`invoices`
   ->    DROP INDEX `idx_combined`,
   ->    ADD INDEX `idx_customerno` (`customerno`);

mysql> explain select customerno, invoicecode
   ->     from invoices
   ->     where customerno between 1000012334 and 1050012334 \G
*************************** 1. row ***************************
         id: 1
select_type: SIMPLE
      table: invoices
       type: range
possible_keys: idx_customerno
        key: idx_customerno
    key_len: 4
        ref: NULL
       rows: 6174
      Extra: Using where; Using index
1 row in set (0.00 sec)

As you can see, the new index is used, which contains only one (visible) column, but still the "Using index" hint is shown, which means no lookups to the actual data was needed to fulfill the request.

Dropping the explicit invoicecode column from the idx_combined index will reduce the size of an index record by about a third, i. e. from 30 to 20 bytes per record. This allows more index records to fit into a single buffer page, in effect allowing for more efficient use of your memory - the more entries per page the more efficient your buffer pool memory can be used and the better your performance can get.

I added support for detecting this kind of redundancy in MySQL Index Analyzer. While at it I also improved the looks a little, utilizing some icons from Eclipse. Analyzing the simple example above will result in this output:

There is no pre-build package so far, and I do not know when I will get to build one. But feel free to always get the latest version from SVN.

Keine Kommentare: