Dienstag, August 29, 2006

First downloadable release 0.02

I just uploaded the first ready-to-use package of the MySQL Index Analyzer (version 0.02). You can download it via http://bismarck.shipdown.de/mia/mia_0.02.zip or click on the "Downloads" link in the sidebar.

This is still a command-line only version, but the DDL generation support has been enabled again.

To use the package, just download and unzip into a new directory. For use on Linux make sure you chmod the "mia" script to make it executable. On both Windows and Linux you have to have a java executable on your path. If you have not, edit the "mia" or "mia.cmd" script to include the full path.

Samstag, August 26, 2006

Refactoring, Step 1: IndexDescriptorProvider

Development of a more structured version is in progress. I just committed a set of changes to the repository that changes the current, rather monolithic, single-class design to what will be the first step of a more modular one.

Currently the generation of ALTER TABLE statements has been removed and the output format of the analysis is slightly different. But that is mere cosmetic

What is much more important is that now the gathering of database schema information is based on a pluggable system, designed around an interface called IndexDescriptorProvider. Up to now I have just ported the MySQL 4 stuff that was already in the first version to this new architecture. Please feel free to have a look at it and tell me what you think.

Next thing I'll do is implement a provider based on the INFORMATION_SCHEMA database available in MySQL 5.x to see if I missed anything.

Donnerstag, August 24, 2006

Basic documentation

While the work to refactor the DuplicateIndexFinder is going on (however not quite as fast as I would like it to), here is some general information about what exists already.

Why does it exist?

The DuplicateIndexFinder class was written, because we have an application that works with a database schema of some 500 tables. Those tables are generated by an object relation persistence mapper from UML diagrams. While the framework used (proprietary) works very well and allows for quick and efficient application development, the generated schema is not perfect.

The schema contains a lot of generically generated index definitions that are not necessary from the database perspective. For example for all tables an index that covers the primary key fields is created, in addition to the primary key. This is especially useless, because it will never be used by the optimizer.

However this is not the only problem. As the schema is generated automatically and the generator tool does not analyze table depencies in advance for the whole schema, a good deal of indices are created that cover a prefix of a longer index.

While there may be situations where indices become very long and a second, shorter index on a prefix may be sensible, usually the extra space used in the tablespace is not worth it. As we are dealing with several hundred databases, each about 100GB in size, superfluous indices can become a real pain in the neck. This is why I hacked this little tool.

What can it do?

Currently the whole point of the MySQL Index Analyzer is really just the detection of duplicate/superfluous indices. However I plan to include much more functionality over time. Right now you can let it look at a database and it will output a text report of which indices are obsoleted by other, longer indices on the same table. Moreover it can generate DDL statements in MySQL syntax that remove the unneeded indices from the tables, therefore releasing the space they occupied before.

How to use it?

The DuplicateIndexFinder is a single Java class that can be run as a command line application. Apart from a Java Runtime Environment you will also need the MySQL JDBC Driver (Connector/J) and the JSAP library (Java-based Simple Argument Parser). The tool has been written using Java 5,but apart from some generic collections nothing fancy is used. I believe it should be possible to compile Java 1.4 compliant version, even though I did not try it myself.

To get an overview of its usage, just run it as follows on a command line:

java de.shipdown.util.mysql.DuplicateIndexFinder --help

This will output general usage information and short explanations for the available options:


Usage: java de.shipdown.util.mysql.DuplicateIndexFinder
                [--help] [(-o|--outfile) <outfile>] [-v|--verbose] [-a|--generate-alter-table] (-D|--database) <dbname> (-u|--user) <dbuser> [(-p|--password) <dbpass>] [(-h|--host) <dbhost>] [(-P|--port) <dbport>]

  [--help]
        Prints this help message.

  [(-o|--outfile) <outfile>]
        Name of an output file. StdOut if omitted.

  [-v|--verbose]
        Requests verbose output (includes column names).

  [-a|--generate-alter-table]
        Generate alter table statements to drop superfluous indices.

  (-D|--database) <dbname>
        Name of the database to analyze.

  (-u|--user) <dbuser>
        User name for database connection.

  [(-p|--password) <dbpass>]
        Password for database connection.

  [(-h|--host) <dbhost>]
        Database hostname or address to connect to. Defaults to localhost.
        (default: localhost)

  [(-P|--port) <dbport>]
        Database network port. Defaults to 3306. (default: 3306)

As you can see most options are designed to be compatible with the usual MySQL command line tools. The only required parameters are the user and database names. All other options have sensible defaults.

If you specify the --generate-alter-table option, you will not only get an analysis of the indices in you database, but also suggessted ALTER TABLE statements. Please make sure you do not blindly apply those to any database. While I do not think anything horrible can happen, I cannot make any guarantee. It never hurts to have a backup!

Using the --verbose option the report will contain column names, so you can see for yourself if an index is really just a prefix of a longer one.

Example

Consider a database "test" with a single table "people" with the following definition:

 CREATE TABLE `people` (
  `id` int(10) unsigned NOT NULL,
  `firstname` varchar(32) default NULL,
  `lastname` varchar(32) default NULL,
  `birthday` date default NULL,
  PRIMARY KEY  (`id`),
  KEY `idx_id` (`id`),
  KEY `idx_first` (`firstname`),
  KEY `idx_last` (`lastname`),
  KEY `idx_fullname` (`firstname`,`lastname`),
  KEY `idx_reversename` (`lastname`,`firstname`),
  KEY `idx_id_birth` (`id`,`birthday`),
  KEY `idx_id_birth_first` (`id`,`birthday`,`firstname`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Now without further ado just run the DuplicateIndexFinder (the example assumes that you checked out the tag "0.01" from the SVN repository to ~/workspace/MIA0.01):

ds@yavin:~/workspace/MIA0.01
$ java -cp lib/mysql-connector-java-3.1.13-bin.jar:lib/JSAP-2.1.jar:build de.shipdown.util.mysql.DuplicateIndexFinder -u ds -D test

Table people
--------------------------------------------------------------------------------
PRIMARY (len:1)
    obsoletes idx_id (len:1)
idx_reversename (len:2)
    obsoletes idx_last (len:1)
idx_fullname (len:2)
    obsoletes idx_first (len:1)
idx_id_birth (len:2)
    obsoletes idx_id (len:1)
idx_id_birth_first (len:3)
    obsoletes idx_id (len:1)
    obsoletes idx_id_birth (len:2)

This output means

  • that the PRIMARY key obsoletes the idx_id, because they are identical. The primary key is always preserved.
  • that idx_reversename makes idx_last obsolete, because any query that searches for just a last name can just as well use the idx_reversename index, as can a query that looks for a full name combination
  • that idx_fullname makes idx_first useless for the same reason as above
  • that idx_id_birth obsoletes idx_id for the same reason as above. Note that the id column is also covered by the primary key.
  • that idx_id_birth_first obsoletes both idx_id and id_id_birth, because it starts with the same columns as the other two.

If you follow the tool's advice you would modify your schema to remove all indices except for the primary key, idx_reversename, idx_fullname, and idx_id_birth_first. You can get the necessary statements to drop the obsolete one like this:

ds@yavin:~/workspace/MIA 0.01
$ java -cp lib/mysql-connector-java-3.1.13-bin.jar:lib/JSAP-2.1.jar:build de.shipdown.util.mysql.DuplicateIndexFinder -u ds -D test --verbose --generate-alter-table

Table people
--------------------------------------------------------------------------------
PRIMARY (cols: 1:id)
    obsoletes idx_id (cols: 1:id)
idx_reversename (cols: 1:lastname, 2:firstname)
    obsoletes idx_last (cols: 1:lastname)
idx_fullname (cols: 1:firstname, 2:lastname)
    obsoletes idx_first (cols: 1:firstname)
idx_id_birth (cols: 1:id, 2:birthday)
    obsoletes idx_id (cols: 1:id)
idx_id_birth_first (cols: 1:id, 2:birthday, 3:firstname)
    obsoletes idx_id (cols: 1:id)
    obsoletes idx_id_birth (cols: 1:id, 2:birthday)


ALTER TABLE STATEMENTS (PLEASE(!) REVIEW, NO WARRANTY FOR ANY DAMAGE TO YOUR DATABASE)
======================================================================================
ALTER TABLE people DROP INDEX idx_first, DROP INDEX idx_last, DROP INDEX idx_id_birth, DROP INDEX idx_id;

As you see the --verbose switch provides a little more detail in the report. But more interestingly you get a suggested DDL statement. Note that all indices are dropped at once. This is especially important for large tables, as MySQL needs to create a full copy of a table if you issue an ALTER TABLE command.

Using this syntax requires only a single copy process, while removing the indices in separate statements would have required 4 copies and therefore about 4x as long.

Montag, August 21, 2006

Welcome to the MySQL Index Analyzer Blog

This is the new blog for the MySQL Index Analyzer, a small Java application written by Daniel Schneller. It was first "released" in the beginning of 2006 in this blog post.

When I read questions about such a tool on the MySQL Performance Blog I decided to release it as a regular open source project. It currently hosted on Google's new code hosting service. You can find it here: http://code.google.com/p/mysqlindexanalyzer/.

More information can be found here soon.

Sonntag, August 20, 2006

Impressum

Daniel Schneller
Kamper Str. 14
42699 Solingen
Deutschland /Germany

Dies ist eine private Internetpräsenz.
Internet: http://mysql-index-analyzer.blogspot.com
Haftungshinweis: Trotz sorgfältiger inhaltlicher Kontrolle übernehmen wir keine Haftung für die Inhalte externer Links. Für den Inhalt der verlinkten Seiten sind ausschließlich deren Betreiber verantwortlich.
Layout und Gestaltung dieser Website sowie die enthaltenen Informationen sind gemäß dem Urheberrechtsgesetz geschützt. Das ist auch zu beachten, wenn auf diesen Internetseiten erscheinende Materialien Dritter zur Informationsgewinnung verwendet oder kopiert werden.
Alle Angaben erfolgen ohne Gewähr. Eine Haftung für Schäden, die sich aus der Verwendung der veröffentlichten Inhalte ergeben, ist ausgeschlossen.
Sollte der Inhalt oder die Aufmachung dieser Seiten Rechte Dritter oder gesetzliche Bestimmungen verletzen, bitten wir um eine entsprechende Nachricht ohne Kostennote.
Wir greifen auf Drittanbieter zurück, um Anzeigen zu schalten, wenn Sie unsere Website besuchen. Diese Unternehmen nutzen möglicherweise Informationen zu Ihren Besuchen dieser und anderer Websites, damit Anzeigen zu Produkten und Diensten geschaltet werden können, die Sie interessieren. Falls Sie mehr über diese Methoden erfahren möchten oder wissen möchten, welche Möglichkeiten bestehen, diese Informationen nicht von den Unternehmen verwenden zu lassen, klicken Sie bitte hier: http://www.google.de/privacy.html
Diese Website benutzt Google Analytics, einen Webanalysedienst der Google Inc., um eine Analyse der Benutzung der Website zu ermöglichen. Die durch den Einsatz von Google Analytics erzeugten Informationen über Ihre Nutzung dieser Website (einschließlich Ihrer IP-Adresse) werden an einen Server der Google Inc. in den USA übertragen und dort gespeichert. Google wird diese Informationen lediglich dazu verwenden, die Nutzung der Website zu analysieren, indem anonymisierte Auswertungen und Grafiken zur Anzahl der Besuche, Anzahl der pro Nutzer aufgerufenen Seiten usw. erstellt werden.
Das Copyright für Bilder auf der Seite liegt bei Daniel Schneller bzw. den jeweilig angegeben Personen.
Dieses Impressum basiert auf der Arbeit von Martin Hamann