Sonntag, Oktober 08, 2006

Stored Connections, Improved Size Estimates

The current SVN trunk revision contains two interesting changes which however I have not put into a new package, because they are not final (enough ;)) yet.

What has been happening recently is

  1. Stored Connections

    I have implemented stored connection profiles, mostly because I had to enter the login data each an every time I ran the app to test something. This is based on the Java Preferences API which in this case uses a simple file backend. Currently passwords are stored in plain text, so please be careful not to save anything important should you decide to try it out.

    In order use the new feature, you will find a new button in the connection dialog to store the data currently shown. To load a saved session, just select it from the drop down combobox.

  2. Improved size estimate for InnoDB

    Based on Peter Zaitsev's article about the MySQL Optimizer and InnoDB primary keys I changed the calculation of the index size for InnoDB tables. I knew this before, but had not thought about integrating it into the Index Analyzer.

    To get more details read the MySQL manual and of course Peter's article. The main point however is that InnoDB indices other than the primary index always contain the primary index columns as an (implicit, invisible) suffix, so large primary keys can increase any other index's size significantly. This is now considered when calculating estimated savings.

I will try to get a new package ready soon, however I am preparing for a vacation, so it might be delayed a little.

Dienstag, September 26, 2006

Stupid bug in 0.04

Yesterday I noticed that the overall database size calculation in 0.04 contains a very stupid bug. While iterating the tables, only their index sizes get added up correctly, while the table data size does not. The display always shows the size of the last table analyzed, so with a single table example, you will not see any difference.

You can check out the trunk version vom SVN to have this fixed before the next release.

Currently I am working on stored connection profiles, which is especially interesting to myself right now, because currently I have to type in the login information each time I run the thing to test it.

Samstag, September 23, 2006

MySQL Index Analyzer: 0.04 released

Version 0.04 of the MySQL Index Analyzer has just been uploaded. This is the first version that features a Swing GUI and actually provides real analysis functions and the generation of ALTER TABLE statements to get rid of superfluous indices.

A quick overview of what is new:

  • Swing GUI
  • Analysis features as on the command line
  • Copying of generated ALTER TABLE statements to the clipboard
  • Information on data and index size distribution
  • Rudimentary analysis of possible disk space savings

Especially the size analysis is in a very early stage and will probably be way off most of the time. I do however plan to spend more work there, because I think this is a very important point.

Documentation in the distribution file is somewhat outdated, but I will give some more details here. The following example is based on a simple database with only one table called people. The table consists of four columns:

 CREATE TABLE `people` (
  `id` int(11) NOT NULL auto_increment,
  `firstname` varchar(32) default NULL,
  `lastname` varchar(32) default NULL,
  `birthday` date default NULL,
  PRIMARY KEY  (`id`),
  KEY `idx_reversename` (`lastname`,`firstname`),
  KEY `idx_id_birth_first` (`id`,`birthday`,`firstname`),
  KEY `idx_full` (`firstname`,`lastname`),
  KEY `idx_first` (`firstname`),
  KEY `idx_id_first` (`id`,`firstname`),
  KEY `idx_id_birth` (`id`,`birthday`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 

There are 7 indices defined on that table, including the PRIMARY key.

Now let's see what the MySQLIndexAnalyzer has to say:

After login, the analysis starts immediately. I will implement a more fine grained control later, for now, the whole schema is analyzed:

Thanks to the great folks of swingfx.dev.java.net waiting is done with some style ;)

As soon as the analysis is ready, you will see a screen like this:

In the upper part of the Window you see some general information about data and index sizes. The Index/Data Ration of 3.0 means, that indices for this database need 3 times the amount of space the real data eats up. This ratio is also displayed as a bar. Please be aware, that all numbers only refer to the tables shown in the details view below, the database might be a lot bigger!

The lower part shows a hierarchical view of the database structure. Tables are shown as little folders, detected index obsoletions are the leafs below them. In the example you can see, that 2 indices where found (idx_first, idx_id_birth) which can be safely removed, because they are only prefixes of idx_full and idx_id_birth_first. For each obsoletion and estimated amount of disk space savings is displayed that would result from dropping that particular index. On the table level you see the sum of all obsoleted indices savings.

Via the "Generate DML" button you get to this dialog:

From here you can either save the DML (ALTER TABLE) statements to a file or store them to the system clipboard: (Please do not blindly apply them to your database!)

Please let me know I you find this tool helpful, have suggestions on what to improve/include, and of course, if you'd like to participate in the development.

Dienstag, September 19, 2006

Internet Explorer Problems

In case you have display problems with MS Internet Explorer, try changing the font size up- and down (e. g. hold Strg and scroll with your mouse wheel once up, once down). This should correct the rendering. As I normally do not use IE (too clumsy to do on Linux) I did only notice the problem today.

Montag, September 18, 2006

GUI coming along

Just a quick update: If you check out the latest trunk version from the subversion repository you will get a working GUI version. After you connected to the database, the analysis will start and the results be displayed in a tree control.

The upper half of the result display is still non-functional, and there is nothing more you can do than look at the result, but I thought someone might be interested in this intermediate step.

Freitag, September 15, 2006

More refactoring is on the way

On the way to a modular design I refactored the source to provide the output through rendering modules. The text output is the first one of those. So from the end-user perspective nothing has changed with the version 0.03 package available now (except that it has grown bigger).

The larger download is however mostly caused by more libraries being included, especially those for the graphical interface that I have just begun. Currently it only provides a login view which lets you connect to the database. However no analysis can be started yet. If you want to have a look at what I intend to let the GUI look like, just modify the main class in the launcher script to de.shipdown....gui.MySQLIndexAnalyzer. I will try to get the first graphical version to actually show something soon.

As for the previous version: To use the package, just download and unzip into a new directory. For use on Linux make sure you chmod the "mia" script to make it executable. On both Windows and Linux you have to have a java executable on your path. If you have not, edit the "mia" or "mia.cmd" script to include the full path.

Dienstag, August 29, 2006

First downloadable release 0.02

I just uploaded the first ready-to-use package of the MySQL Index Analyzer (version 0.02). You can download it via http://bismarck.shipdown.de/mia/mia_0.02.zip or click on the "Downloads" link in the sidebar.

This is still a command-line only version, but the DDL generation support has been enabled again.

To use the package, just download and unzip into a new directory. For use on Linux make sure you chmod the "mia" script to make it executable. On both Windows and Linux you have to have a java executable on your path. If you have not, edit the "mia" or "mia.cmd" script to include the full path.

Samstag, August 26, 2006

Refactoring, Step 1: IndexDescriptorProvider

Development of a more structured version is in progress. I just committed a set of changes to the repository that changes the current, rather monolithic, single-class design to what will be the first step of a more modular one.

Currently the generation of ALTER TABLE statements has been removed and the output format of the analysis is slightly different. But that is mere cosmetic

What is much more important is that now the gathering of database schema information is based on a pluggable system, designed around an interface called IndexDescriptorProvider. Up to now I have just ported the MySQL 4 stuff that was already in the first version to this new architecture. Please feel free to have a look at it and tell me what you think.

Next thing I'll do is implement a provider based on the INFORMATION_SCHEMA database available in MySQL 5.x to see if I missed anything.

Donnerstag, August 24, 2006

Basic documentation

While the work to refactor the DuplicateIndexFinder is going on (however not quite as fast as I would like it to), here is some general information about what exists already.

Why does it exist?

The DuplicateIndexFinder class was written, because we have an application that works with a database schema of some 500 tables. Those tables are generated by an object relation persistence mapper from UML diagrams. While the framework used (proprietary) works very well and allows for quick and efficient application development, the generated schema is not perfect.

The schema contains a lot of generically generated index definitions that are not necessary from the database perspective. For example for all tables an index that covers the primary key fields is created, in addition to the primary key. This is especially useless, because it will never be used by the optimizer.

However this is not the only problem. As the schema is generated automatically and the generator tool does not analyze table depencies in advance for the whole schema, a good deal of indices are created that cover a prefix of a longer index.

While there may be situations where indices become very long and a second, shorter index on a prefix may be sensible, usually the extra space used in the tablespace is not worth it. As we are dealing with several hundred databases, each about 100GB in size, superfluous indices can become a real pain in the neck. This is why I hacked this little tool.

What can it do?

Currently the whole point of the MySQL Index Analyzer is really just the detection of duplicate/superfluous indices. However I plan to include much more functionality over time. Right now you can let it look at a database and it will output a text report of which indices are obsoleted by other, longer indices on the same table. Moreover it can generate DDL statements in MySQL syntax that remove the unneeded indices from the tables, therefore releasing the space they occupied before.

How to use it?

The DuplicateIndexFinder is a single Java class that can be run as a command line application. Apart from a Java Runtime Environment you will also need the MySQL JDBC Driver (Connector/J) and the JSAP library (Java-based Simple Argument Parser). The tool has been written using Java 5,but apart from some generic collections nothing fancy is used. I believe it should be possible to compile Java 1.4 compliant version, even though I did not try it myself.

To get an overview of its usage, just run it as follows on a command line:

java de.shipdown.util.mysql.DuplicateIndexFinder --help

This will output general usage information and short explanations for the available options:


Usage: java de.shipdown.util.mysql.DuplicateIndexFinder
                [--help] [(-o|--outfile) <outfile>] [-v|--verbose] [-a|--generate-alter-table] (-D|--database) <dbname> (-u|--user) <dbuser> [(-p|--password) <dbpass>] [(-h|--host) <dbhost>] [(-P|--port) <dbport>]

  [--help]
        Prints this help message.

  [(-o|--outfile) <outfile>]
        Name of an output file. StdOut if omitted.

  [-v|--verbose]
        Requests verbose output (includes column names).

  [-a|--generate-alter-table]
        Generate alter table statements to drop superfluous indices.

  (-D|--database) <dbname>
        Name of the database to analyze.

  (-u|--user) <dbuser>
        User name for database connection.

  [(-p|--password) <dbpass>]
        Password for database connection.

  [(-h|--host) <dbhost>]
        Database hostname or address to connect to. Defaults to localhost.
        (default: localhost)

  [(-P|--port) <dbport>]
        Database network port. Defaults to 3306. (default: 3306)

As you can see most options are designed to be compatible with the usual MySQL command line tools. The only required parameters are the user and database names. All other options have sensible defaults.

If you specify the --generate-alter-table option, you will not only get an analysis of the indices in you database, but also suggessted ALTER TABLE statements. Please make sure you do not blindly apply those to any database. While I do not think anything horrible can happen, I cannot make any guarantee. It never hurts to have a backup!

Using the --verbose option the report will contain column names, so you can see for yourself if an index is really just a prefix of a longer one.

Example

Consider a database "test" with a single table "people" with the following definition:

 CREATE TABLE `people` (
  `id` int(10) unsigned NOT NULL,
  `firstname` varchar(32) default NULL,
  `lastname` varchar(32) default NULL,
  `birthday` date default NULL,
  PRIMARY KEY  (`id`),
  KEY `idx_id` (`id`),
  KEY `idx_first` (`firstname`),
  KEY `idx_last` (`lastname`),
  KEY `idx_fullname` (`firstname`,`lastname`),
  KEY `idx_reversename` (`lastname`,`firstname`),
  KEY `idx_id_birth` (`id`,`birthday`),
  KEY `idx_id_birth_first` (`id`,`birthday`,`firstname`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Now without further ado just run the DuplicateIndexFinder (the example assumes that you checked out the tag "0.01" from the SVN repository to ~/workspace/MIA0.01):

ds@yavin:~/workspace/MIA0.01
$ java -cp lib/mysql-connector-java-3.1.13-bin.jar:lib/JSAP-2.1.jar:build de.shipdown.util.mysql.DuplicateIndexFinder -u ds -D test

Table people
--------------------------------------------------------------------------------
PRIMARY (len:1)
    obsoletes idx_id (len:1)
idx_reversename (len:2)
    obsoletes idx_last (len:1)
idx_fullname (len:2)
    obsoletes idx_first (len:1)
idx_id_birth (len:2)
    obsoletes idx_id (len:1)
idx_id_birth_first (len:3)
    obsoletes idx_id (len:1)
    obsoletes idx_id_birth (len:2)

This output means

  • that the PRIMARY key obsoletes the idx_id, because they are identical. The primary key is always preserved.
  • that idx_reversename makes idx_last obsolete, because any query that searches for just a last name can just as well use the idx_reversename index, as can a query that looks for a full name combination
  • that idx_fullname makes idx_first useless for the same reason as above
  • that idx_id_birth obsoletes idx_id for the same reason as above. Note that the id column is also covered by the primary key.
  • that idx_id_birth_first obsoletes both idx_id and id_id_birth, because it starts with the same columns as the other two.

If you follow the tool's advice you would modify your schema to remove all indices except for the primary key, idx_reversename, idx_fullname, and idx_id_birth_first. You can get the necessary statements to drop the obsolete one like this:

ds@yavin:~/workspace/MIA 0.01
$ java -cp lib/mysql-connector-java-3.1.13-bin.jar:lib/JSAP-2.1.jar:build de.shipdown.util.mysql.DuplicateIndexFinder -u ds -D test --verbose --generate-alter-table

Table people
--------------------------------------------------------------------------------
PRIMARY (cols: 1:id)
    obsoletes idx_id (cols: 1:id)
idx_reversename (cols: 1:lastname, 2:firstname)
    obsoletes idx_last (cols: 1:lastname)
idx_fullname (cols: 1:firstname, 2:lastname)
    obsoletes idx_first (cols: 1:firstname)
idx_id_birth (cols: 1:id, 2:birthday)
    obsoletes idx_id (cols: 1:id)
idx_id_birth_first (cols: 1:id, 2:birthday, 3:firstname)
    obsoletes idx_id (cols: 1:id)
    obsoletes idx_id_birth (cols: 1:id, 2:birthday)


ALTER TABLE STATEMENTS (PLEASE(!) REVIEW, NO WARRANTY FOR ANY DAMAGE TO YOUR DATABASE)
======================================================================================
ALTER TABLE people DROP INDEX idx_first, DROP INDEX idx_last, DROP INDEX idx_id_birth, DROP INDEX idx_id;

As you see the --verbose switch provides a little more detail in the report. But more interestingly you get a suggested DDL statement. Note that all indices are dropped at once. This is especially important for large tables, as MySQL needs to create a full copy of a table if you issue an ALTER TABLE command.

Using this syntax requires only a single copy process, while removing the indices in separate statements would have required 4 copies and therefore about 4x as long.

Montag, August 21, 2006

Welcome to the MySQL Index Analyzer Blog

This is the new blog for the MySQL Index Analyzer, a small Java application written by Daniel Schneller. It was first "released" in the beginning of 2006 in this blog post.

When I read questions about such a tool on the MySQL Performance Blog I decided to release it as a regular open source project. It currently hosted on Google's new code hosting service. You can find it here: http://code.google.com/p/mysqlindexanalyzer/.

More information can be found here soon.

Sonntag, August 20, 2006

Impressum

Daniel Schneller
Kamper Str. 14
42699 Solingen
Deutschland /Germany

Dies ist eine private Internetpräsenz.
Internet: http://mysql-index-analyzer.blogspot.com
Haftungshinweis: Trotz sorgfältiger inhaltlicher Kontrolle übernehmen wir keine Haftung für die Inhalte externer Links. Für den Inhalt der verlinkten Seiten sind ausschließlich deren Betreiber verantwortlich.
Layout und Gestaltung dieser Website sowie die enthaltenen Informationen sind gemäß dem Urheberrechtsgesetz geschützt. Das ist auch zu beachten, wenn auf diesen Internetseiten erscheinende Materialien Dritter zur Informationsgewinnung verwendet oder kopiert werden.
Alle Angaben erfolgen ohne Gewähr. Eine Haftung für Schäden, die sich aus der Verwendung der veröffentlichten Inhalte ergeben, ist ausgeschlossen.
Sollte der Inhalt oder die Aufmachung dieser Seiten Rechte Dritter oder gesetzliche Bestimmungen verletzen, bitten wir um eine entsprechende Nachricht ohne Kostennote.
Wir greifen auf Drittanbieter zurück, um Anzeigen zu schalten, wenn Sie unsere Website besuchen. Diese Unternehmen nutzen möglicherweise Informationen zu Ihren Besuchen dieser und anderer Websites, damit Anzeigen zu Produkten und Diensten geschaltet werden können, die Sie interessieren. Falls Sie mehr über diese Methoden erfahren möchten oder wissen möchten, welche Möglichkeiten bestehen, diese Informationen nicht von den Unternehmen verwenden zu lassen, klicken Sie bitte hier: http://www.google.de/privacy.html
Diese Website benutzt Google Analytics, einen Webanalysedienst der Google Inc., um eine Analyse der Benutzung der Website zu ermöglichen. Die durch den Einsatz von Google Analytics erzeugten Informationen über Ihre Nutzung dieser Website (einschließlich Ihrer IP-Adresse) werden an einen Server der Google Inc. in den USA übertragen und dort gespeichert. Google wird diese Informationen lediglich dazu verwenden, die Nutzung der Website zu analysieren, indem anonymisierte Auswertungen und Grafiken zur Anzahl der Besuche, Anzahl der pro Nutzer aufgerufenen Seiten usw. erstellt werden.
Das Copyright für Bilder auf der Seite liegt bei Daniel Schneller bzw. den jeweilig angegeben Personen.
Dieses Impressum basiert auf der Arbeit von Martin Hamann