Hello, blog! Long time without posting. Here I am again.
My attention was caught a few days ago by a contest posted by Tokutek.
This is a company that makes a proprietary engine for MySQL. In their contest, they claim that it takes 20 days to insert 1 billion records into a InnoDB table, and that MyISAM is not even up to the task. In their example, InnoDB inserts 404 rows per second and MyISAM died after a few days of inserting 93 rows per second.
It looks farfetched to me. Surely you can do better than that.
Now, I don't have a quad 3.16Ghz Xeon server with 16GB RAM available like the one used by Tokutek, and so I tried with a modest quad 2.7Ghz AMD, 8GB RAM server. The problem is mostly the same. Once your index becomes bigger than the available RAM, you are in the glue. The only difference is that my server was slower.
I didn't try with InnoDB, but I tackled the MyISAM case at once. Using MySQL 5.1, I just added 400 partitions to the contest table, partitioning by range of 50 million records. And I made a simple Perl script that produces INSERT statements with 50,000 rows each.
The result? I inserted the rows into a MyISAM table at an average rate of 8,800 rows per second, with peaks of 25,000 rps. The speed did never drop below 5,000 rps anyway. The whole operation took 31.5 hours. That's much less than 20 days, and surely a hell of a lot less than the projected 167 days that should be needed for MyISAM according to the contest slides.
Should I claim the ridiculous contest prize? (a $100 Starbucks card? Gimme a break!)
No. I just wanted to make a point. The above instructions are enough for anyone savvy to reproduce the case. If the prize changes, and becomes a $2,000 donation to the FSF, for example, I will release the code that I used. In the meantime, enjoy your own tests!
While I was testing, I saw that MySQL released 5.1.30-GA. Great!. I used 5.1.29 for my experiment. Had I known about the release, I would have waited two days. Anyway, if you believe TokuTek benchmarks, using 5.1 partitions will result in a performance 200 times better than the original. There must be something wrong in their method.
Thursday, November 27, 2008
Subscribe to:
Posts (Atom)