Mittwoch, Dezember 02, 2009

Index Analyzer for MySQL - r123

Thanks to a bug report that at first seemed to be merely a simple output bug, I went into the code and found two other problems. One was with the detection of duplicates in the analysis result, another one was about refreshing the GUI from a non Event Queue Thread.

Both issues have been fixed in r123.

Thanks to barton.tomas for reporting this.

Sonntag, August 30, 2009

Index Analyzer for MySQL - r120

I finally found some time to update this little project of mine. Most notably, there was a bug - reported some time ago and partially fixed in the repository - that caused problems when column or table names contained unusual characters or conflicted with SQL keywords.

This fix is now included in the current archive downloadable from the Google-Code site.

Moreover I did some cleanup and prevented the app from crashing when the database was not reachable.

The current release is called r120 - the Subversion revision number of the source used to build it. This is easier to track than the previous 0.0x counting.

Sonntag, September 02, 2007

Downloadable Version 0.05

After all I have got to build a pre-packaged version again. Thanks to the Fat-Jar Eclipse plugin it was just a matter of a few clicks. Version 0.05 (downloadable here or from the Google Code download page) contains the changes described in the previous post.

Mittwoch, Juli 25, 2007

Duplicate fields in index

It has been an really long time since I last updated. No, I have not been on vacation all the time (nevertheless I have been more than once since the last post :)), but lots of work and private stuff just did not leave enough time to do any updates.

So, what's new...

Yesterday I looked around Google Video for TechTalks on the hunt for something interesting and stumbled across Performance Tuning Best Practices for MySQL. In this quite interesting - even though somewhat outdated talk - Jay mentioned again a known fact about InnoDb indexes. Internally every non-primary index on an InnoDb table is appended with the primary key fields. This is because in contrast to MyISAM where all indexes are treated the same, the primary key is a clustered index in InnoDb. See the manual page or e. g. a sample chapter on Query Optimization from MySQL, 3rd Edition by Sams Publishing. This leads to an interesting situation. Consider this (hypothetical) table:

CREATE TABLE `invoices` (
`invoicecode` char(10) NOT NULL,
`customercode` char(10) NOT NULL,
`email` varchar(45) NOT NULL,
`duedate` char(8) NOT NULL,
PRIMARY KEY  (`invoicecode`)

The records will be physically saved in lexical invoicecode order. InnoDb will store all the field data directly with the primary key. Adding an index e. g. on the customerno column will copy all the customerno values into a different location in their natural order. For each customerno the corresponding invoicecode will be stored, there too. Looking records up by customerno will first cause the secondary index to be accessed. Only then will the actual row data be accessed by a second lookup using the invoicecode found for a customerno.

Often a single customer will have several invoices, so the database designer might come up with a combined - covering - secondary index for quicker lookups by customerno and invoicecode:

ALTER TABLE `invoices`
  ADD INDEX `idx_combined`(`customerno`, `invoicecode`);

This seems to be a good idea, especially queries by customerno are intended to retrieve only the invoicecodes. In such cases MySQL is able to satisfy the request by only using the index, no second access as described above is necessary, because all required data is contained in the index itself. Using the EXPLAIN command on a query like SELECT customerno, invoicecode FROM invoices will show "index" in the "type" column (or "Using index" in the "Extra" column).

So everything is good, right? Well, it could be better: Because invoicecode is the primary key it will be appended to all other indexes. This means the idx_combined index internally looks like this (`customerno`, `invoicecode`, `invoicecode`) with the second `invoicecode` being the primary key that has been added automatically.

Usually MySQL will give you a "Duplicate Column" error when you try to include the same column more than once in an index definition, but as this is specific to InnoDb the MySQL layer above seems not to know about this.

I filled the example table with around 74.000 randomly generated records. Now explaining an example query show this:

mysql> explain select customerno, invoicecode
   ->   from invoices
   ->   where customerno between 1000012334 and 1050012334 \G
*************************** 1. row ***************************
         id: 1
select_type: SIMPLE
      table: invoices
       type: range
possible_keys: idx_combined
        key: idx_combined
    key_len: 4
        ref: NULL
       rows: 6110
      Extra: Using where; Using index
1 row in set (0.00 sec)

Notice the "Using index" hint in the "Extra" column. This is because MySQL can take all the columns we want from the index. According to "key:" it is in fact using our composite key.

Now we try the same thing again after modifying the idx_combined by dropping the invoicecode column from it:

mysql> ALTER TABLE `test`.`invoices`
   ->    DROP INDEX `idx_combined`,
   ->    ADD INDEX `idx_customerno` (`customerno`);

mysql> explain select customerno, invoicecode
   ->     from invoices
   ->     where customerno between 1000012334 and 1050012334 \G
*************************** 1. row ***************************
         id: 1
select_type: SIMPLE
      table: invoices
       type: range
possible_keys: idx_customerno
        key: idx_customerno
    key_len: 4
        ref: NULL
       rows: 6174
      Extra: Using where; Using index
1 row in set (0.00 sec)

As you can see, the new index is used, which contains only one (visible) column, but still the "Using index" hint is shown, which means no lookups to the actual data was needed to fulfill the request.

Dropping the explicit invoicecode column from the idx_combined index will reduce the size of an index record by about a third, i. e. from 30 to 20 bytes per record. This allows more index records to fit into a single buffer page, in effect allowing for more efficient use of your memory - the more entries per page the more efficient your buffer pool memory can be used and the better your performance can get.

I added support for detecting this kind of redundancy in MySQL Index Analyzer. While at it I also improved the looks a little, utilizing some icons from Eclipse. Analyzing the simple example above will result in this output:

There is no pre-build package so far, and I do not know when I will get to build one. But feel free to always get the latest version from SVN.

Sonntag, Oktober 08, 2006

Stored Connections, Improved Size Estimates

The current SVN trunk revision contains two interesting changes which however I have not put into a new package, because they are not final (enough ;)) yet.

What has been happening recently is

  1. Stored Connections

    I have implemented stored connection profiles, mostly because I had to enter the login data each an every time I ran the app to test something. This is based on the Java Preferences API which in this case uses a simple file backend. Currently passwords are stored in plain text, so please be careful not to save anything important should you decide to try it out.

    In order use the new feature, you will find a new button in the connection dialog to store the data currently shown. To load a saved session, just select it from the drop down combobox.

  2. Improved size estimate for InnoDB

    Based on Peter Zaitsev's article about the MySQL Optimizer and InnoDB primary keys I changed the calculation of the index size for InnoDB tables. I knew this before, but had not thought about integrating it into the Index Analyzer.

    To get more details read the MySQL manual and of course Peter's article. The main point however is that InnoDB indices other than the primary index always contain the primary index columns as an (implicit, invisible) suffix, so large primary keys can increase any other index's size significantly. This is now considered when calculating estimated savings.

I will try to get a new package ready soon, however I am preparing for a vacation, so it might be delayed a little.

Dienstag, September 26, 2006

Stupid bug in 0.04

Yesterday I noticed that the overall database size calculation in 0.04 contains a very stupid bug. While iterating the tables, only their index sizes get added up correctly, while the table data size does not. The display always shows the size of the last table analyzed, so with a single table example, you will not see any difference.

You can check out the trunk version vom SVN to have this fixed before the next release.

Currently I am working on stored connection profiles, which is especially interesting to myself right now, because currently I have to type in the login information each time I run the thing to test it.

Samstag, September 23, 2006

MySQL Index Analyzer: 0.04 released

Version 0.04 of the MySQL Index Analyzer has just been uploaded. This is the first version that features a Swing GUI and actually provides real analysis functions and the generation of ALTER TABLE statements to get rid of superfluous indices.

A quick overview of what is new:

  • Swing GUI
  • Analysis features as on the command line
  • Copying of generated ALTER TABLE statements to the clipboard
  • Information on data and index size distribution
  • Rudimentary analysis of possible disk space savings

Especially the size analysis is in a very early stage and will probably be way off most of the time. I do however plan to spend more work there, because I think this is a very important point.

Documentation in the distribution file is somewhat outdated, but I will give some more details here. The following example is based on a simple database with only one table called people. The table consists of four columns:

 CREATE TABLE `people` (
  `id` int(11) NOT NULL auto_increment,
  `firstname` varchar(32) default NULL,
  `lastname` varchar(32) default NULL,
  `birthday` date default NULL,
  PRIMARY KEY  (`id`),
  KEY `idx_reversename` (`lastname`,`firstname`),
  KEY `idx_id_birth_first` (`id`,`birthday`,`firstname`),
  KEY `idx_full` (`firstname`,`lastname`),
  KEY `idx_first` (`firstname`),
  KEY `idx_id_first` (`id`,`firstname`),
  KEY `idx_id_birth` (`id`,`birthday`)

There are 7 indices defined on that table, including the PRIMARY key.

Now let's see what the MySQLIndexAnalyzer has to say:

After login, the analysis starts immediately. I will implement a more fine grained control later, for now, the whole schema is analyzed:

Thanks to the great folks of waiting is done with some style ;)

As soon as the analysis is ready, you will see a screen like this:

In the upper part of the Window you see some general information about data and index sizes. The Index/Data Ration of 3.0 means, that indices for this database need 3 times the amount of space the real data eats up. This ratio is also displayed as a bar. Please be aware, that all numbers only refer to the tables shown in the details view below, the database might be a lot bigger!

The lower part shows a hierarchical view of the database structure. Tables are shown as little folders, detected index obsoletions are the leafs below them. In the example you can see, that 2 indices where found (idx_first, idx_id_birth) which can be safely removed, because they are only prefixes of idx_full and idx_id_birth_first. For each obsoletion and estimated amount of disk space savings is displayed that would result from dropping that particular index. On the table level you see the sum of all obsoleted indices savings.

Via the "Generate DML" button you get to this dialog:

From here you can either save the DML (ALTER TABLE) statements to a file or store them to the system clipboard: (Please do not blindly apply them to your database!)

Please let me know I you find this tool helpful, have suggestions on what to improve/include, and of course, if you'd like to participate in the development.