The IPTrack application that I wrote saves a bunch of Netflow data generated by routers into a MySQL database for analysis and summarization. During peak usage times, it inserts about 50k rows per minute into a table. To keep the table at a manageable size, it then summarizes the useful data into a lower-volume table and deletes any Netflow data older than 10 minutes out of the table. This particular table usually has between 300k and 500k rows in it at any given time.
The MEMORY storage engine operates completely from Ram. Of course this has the advantage of being very quick compared to File based access. The downside is that any data in it is lost if the MySQL instance has to be restarted for any reason. Since my application just uses it for temporary data anyway, and it doesn’t matter all that much if I lose up to 10 minutes of data, this seemed perfectly acceptable.
So, I tried converting the table from MyISAM to MEMORY. For the first couple minutes it looked pretty promising. Disk IO and the machine’s load was pretty small and it was cruising along. But, after ten minutes, when it got to the point where it was purging old data from the table, it came to a grinding halt. The delete statement, which took only a few seconds using the MyISAM engine just kept going and going. It ended up taking about 9 MINUTES to complete. That, of course, is pretty unacceptable. The MEMORY storage engine was supposed to be faster than MyISAM, so this made no sense. The table was already pretty well indexed. I was deleting based on the timestamp column:
DELETE FROM flows WHERE timestamp < ?
and the timestamp column had an index. I had no idea what could make it take so long. I ran across a bug report of somebody else having the problem. The reply was that this was just the nature of hash indexes. The bug report was from 2004 though, so I posted my problem to the mysql forums in search of an answer.
I just noticed that somebody replied to my post and said to use btree indexes instead of hash indexes (the default for the MEMORY engine). Sure enough, I recreated my indexes using btree and it works perfectly now.
CREATE INDEX timestamp_btree USING BTREE ON flows (timestamp);
Thanks to KimSeong Loh on the MySQL forums for solving this for me
One thought on “Using btree indexes to speed up MySQL MEMORY (HEAP) table deletes”
I’ve been monitoring this table and noticed that it just continually grows in size. Upon investigating, I found a MySQL Bug report that confirms the problem and says that it is fixed in 5.0.23. Evidently it is not a problem with the actual memory usage, just how MySQL calculates the usage.