Jamie's Weblog

Procrastination and utter drivel since 2001!
  • Email
  • Github
  • Linkedin
  • Twitter
  • Rss
  • Home
  • About
  • Archives
  • Publications
Home» Java » Performance Benchmarking of Embedded Databases

Performance Benchmarking of Embedded Databases

Posted on March 3, 2005 by Jamie Lawrence in Java

A simplistic look at the performance of several embedded Java databases

_update_: I’ve update some of the results to include HSQLDB’s CACHED tables.

h2. Introduction

As part of my PhD research I am developing a fairly complex simulation of pedestrian movement. Well, it’s at least moderately complex, particularly when scaled up to tens of thousands of people! I’ve have developed the simulation with MySQL as a backend, serving both to provide the input data (overall configuration, street maps, routes, agent preferences, etc) and collect the output data (basically an event is generated whenever two pedestrians walk past each other). Further overviews on my research can be found at in the “research section”:http://jamie.ideasasylum.com/research/phd.html

MySQL was initially chosen because I had some experience with it via PHP, version 4.1 supports OpenGIS definitions (which, for example, allows me to query for all streets within a particular area) and it has plenty of formal and informal support (partiularly documentation and management tools). In order to explore the parameter space of my simulation I need to deploy it on the University’s research cluster. However, accessing a remote database, particularly for such frequent input/output, was not a plausible or efficient situation. The easiest solution was to replace MySQL with a pure-Java, embedded database. Embedded databases eschew the client-server architecture of a mainstream database (such as Oracle, Postgres, MySQL) and instead execute within the JVM and store their data in local files. Most embedded databases still use the standard JDBC interface through which traditional client-server databases are accessed. Thus, you should be able to change the database drivers (and possibly make minors modifications to the SQL) and everything would work just like with MySQL — but faster and with no network requirements. In reality, it isn’t quite that simple.

It seemed like a wise step to spend a short time comparing the available solutions which is what this article is all about. Please note that, although I’m a Java programmer for the past 8 years, I am not a database expert. I make no claims to the accuracy of this article. Please don’t simply scan the graphs and read the conclusions without understanding what I have benchmarked! Also, I wouldn’t try to generalise the results too much — it’s always best to benchmark your own candidates with a representative usecase from your application. I supply the benchmark source code for you to peruse and adapt as you require.

h2. Candidate Databases

So, based on Google, “Java-source”:http://www.java-source.net/ and general knowledge, I went in search of a pure Java embedded database which would fulfil my criteria:

* Free for non-commercial/educational use and/or open source (well, I am a student!)
* Save to local files (which should be documented so I can retrieve them from the remote machines)
* Support JDBC and act as a relational database
* Support auto-incremented integer columns
* Be fast and small with minimal external configuration
* INSERT operations are probably more important than SELECTs so optimisation there would be appreciated
* Good documentation
* Recently updated and under active development

Probably the most widely used embedded database is “Sleepycat’s”:http://www.sleepycat.com/ “BerkeleyDB”:http://www.sleepycat.com/products/db.shtml, of which they now offer “a pure Java version”:http://www.sleepycat.com/products/je.shtml. Personally I’m a bit of a fan of this database but it uses a record structure, not a relational one, and therefore it doesn’t support JDBC. I’ve found three other possible candidates: HSQLDB, Derby (previously IBM’s Cloudscape) and Daffodil’s One$DB. I give a very brief overvew of each database below but the primary focus of this article is performance benchmarking not feature comparison.

h3. MySQL

For the sake of completeness, I include “MySQL”:http://www.mysql.com/ here as it is the database I’m migrating from.
MySQL is a popular cross-platform open-source database with extensive documentation, books and tools. MySQL is a native application, which is accessed in a client-server fashion, and is widely used in combination with PHP and Apache for web applications.

h3. HSQLDB

“HSQLDB”:http://hsqldb.sourceforge.net/, previously known as HypersonicDB, is a mature Java embedded database which has recently found favour with the “OpenOffice.org”:http://www.openoffice.org team and it will be integrated with their forthcoming database office application, Base. HSQLDB is often used in combination with Hibernate. The website for HSQLDB is fairly plain, but easy to navigate, and the documentation appears to be quite comprehensive. There are a few tools supplied with HSQLDB, including a database browser, but several other tools also support this database (including the useful and attractive “DbVisualiser”:http://www.minq.se/products/dbvis/)

h3. Derby

“Derby”:http://incubator.apache.org/derby/ (previously known as Cloudscape) was recently open-sourced by IBM and contributed to the Apache project. As such, it is a mature product but a relatively unknown quantity to most developers. The website is clean and there is a good level of useful documentation available. There are a few tools provided with the distribution including a rudimentary viewer and a command -line interface.

h3. Daffodi One$DB

“One$DB”:http://www.daffodildb.com/one-dollar-db.html is the open-source version of DaffodilDB but retains most of it’s features. One$DB was open sourced in December 2004 and can be embedded into your application or in a typical client-server database. One$DB is supplied with a database browser (although you can also use DbVisualiser) and a good selection of well-presented documentation (although their SQL reference could use some examples). Daffodil can also supply an ODBC driver and database replicator which work with One$DB.

h2. Method

My primary goals were to perform the same tests on each database using the default setup for each system. To this end I’ve written a class, unimaginatively called Benchmark, which creates a table, INSERTs some rows and then reads them back using a SELECT statement. Originally, I had intended to use the same SQL for all four databases but there were enough differences that each database now has its own SQL statements. I was concerned that many of the embedded database would be doing some heavy caching. To exclude this possibility, the database connection is closed after each operation:

start timer 1
open database
DROP the table if it exists and (re-)CREATE the table
close the database
stop timer 1
start timer 2
open the database
INSERT n rows using a PreparedStatement
close the database
stop timer 2
start timer 3
open the database
SELECT all rows using a PreparedStatement
close the database
stop timer 3

This is an example of the table used in the benchmark. It is a simple combination of popular column types which I shall require in my work:

CREATE TABLE test (
id INTEGER GENERATED BY DEFAULT AS IDENTITY,
name VARCHAR( 254 ) NOT NULL,
value INT NOT NULL,
date DATE NOT NULL,
longnumber BIGINT NOT NULL,
floatnumber FLOAT NOT NULL,
PRIMARY KEY ( id ))

h3. Implementation

I won’t write too much on the actual implementation of the benchmarks since its pretty simple and boring. The whole suite (such as it is) is “available for download”:/research/dbBenchmark.zip [~6MB]. It contains the source code, all the required libraries and a Netbeans 4.0 project to build it with. Of course, since Netbeans actually uses ANT as a build environment you can just type ant jar it the main dbBenchmark directory (sorry for the blatant publisising but I’m a bit of a Netbeans 4 fan). There are a bunch of .bat files which may need to be modified for your environments but they should provide you with an understanding of the various command-line options.

Feel free to modify the benchmarks to suit your own requirements. I won’t dignify it by slapping an open-source licence on it but treat it as public domain code — although if you make any significant changes I’d be interested to hear about it in the comments below.

h3. Usage

The benchmark is invoked from the command line using the following options:

usage: java com.ideasasylum.dbbenchmark.Benchmark
|-I,–increment|Increment the select and inserts for load testing|
|-b,–benchmark|The class name of the benchmark to execute|
|-d,–database|The JDBC url of the database|
|-j,–driver|JDBC Driver|
|-n,–runs|The number of benchmark run to perform|
|-o,–output|The output file name (CSV format)|
|-p,–password|The database password|
|-r,–rows|The number of selects to perform (selects should be >= inserts)|
|-u,–username|The database username|

For example, the following command will benchmark a Derby database using the Benchmark class com.ideasasylum.dbbenchmark.DerbyBenchmark, the driver org.apache.derby.jdbc.EmbeddedDriver and the database jdbc:derby:derbytest;create=true. It will perform 10 runs, starting with 5000 inserts and incrementing this by 5000 each time (so the last run will be inserting and retrieving 50000 rows). The output is sent to a comma-seperated file, derbyload.csv.

java -classpath dist/dbBenchmark.jar;lib/derby.jar;lib/commons-cli-1.0.jar com.ideasasylum.dbbenchmark.Benchmark -b com.ideasasylum.dbbenchmark.DerbyBenchmark -j org.apache.derby.jdbc.EmbeddedDriver -d jdbc:derby:derbytest;create=true -n 10 -r 5000 -o derbyload.csv --increment=5000

h3. Configuration

|Name|Version|Driver|
|MySQL|4.1.7-nt|com.mysql.jdbc.Driver (version 3.1.7)|
|HSQLDB|1.7.3|org.hsqldb.jdbcDriver|
|Derby|10.0.2.1|org.apache.derby.jdbc.EmbeddedDriver|
|One$DB|4.0|in.co.daffodil.db.jdbc.DaffodilDBDriver|

All tests were performed using Java 5 on a 2.6GHz P4, 1GB RAM and Windows XP. The machine was lightly loaded (e.g. email client etc). The results from each run are output into a CSV file. An Excel spreadsheet links in these files and plots a few graphs, calculates some averages etc. The data supplied here is not definitive but it is enough to provide a quick impression of the speed of each database.

During the course of the experiments it became clear that HSQLDB was unbelievably fast. _Too fast_. Upon checking the documentation, I discovered that HSQLDB should be shutdown by sending it the SQL command, SHUTDOWN. It doesn’t shutdown properly if connection.close() is used although, since it is a reliable database, no data loss appears to occur. I’ve included a seperate series called “HSQLDB Shutdown” which benchmarks HSQLDB when it is shutdown properly. Update: As someone pointed out, by default HSQLDB creates in-memory tables unless you use CREATE CACHED TABLE. I’ve added two more columns which show the results when HSQLDB is run using CACHED tables and when SHUTDOWN properly.

h2. Results

h3. Drop/Create Performance

|Rows|MySQL|HSQLDB|Derby|Daffodil|HSQLDB Shutdown|HSQLDB Cached|HSQLDB Cached Shutdown|
|5000|532|3844|7625|4172|906|938|624|
|10000|141|0|454|750|375|16|219|
|15000|94|0|390|735|531|0|219|
|20000|125|0|344|703|750|0|203|
|25000|94|16|360|688|938|0|219|
|30000|94|0|406|704|1172|16|235|
|35000|93|0|375|703|1328|15|437|
|40000|110|0|359|672|1563|15|281|
|45000|94|0|297|688|1750|15|250|
|50000|110|0|344|703|1922|16|188|

h3. Insertion Performance

|Rows|MySQL|HSQLDB|Derby|Daffodil|HSQLDB Shutdown|HSQLDB Cached|HSQLDB Cached Shutdown|
|5000|113527|312|7078|4781|766|765|1860|
|10000|227511|422|10374|7156|890|625|1609|
|15000|332400|672|16484|10531|1250|829|1984|
|20000|433583|890|21687|17499|1625|1110|2874|
|25000|540303|1094|35968|18765|1937|1375|3624|
|30000|671339|1313|32125|21733|2406|2531|4187|
|35000|792639|1500|37531|25156|2687|2375|4250|
|40000|934574|1750|42984|28827|3109|3188|5187|
|45000|1048950|1953|48499|32171|3469|2797|4828|
|50000|1246853|2140|53780|35093|3796|4547|4843|

h3. Selection Performance

|Rows|MySQL|HSQLDB|Derby|Daffodil|HSQLDB Shutdown|HSQLDB Cached|HSQLDB Cached Shutdown|
|5000|297|63|187|406|594|63|390|
|10000|265|31|188|359|891|46|328|
|15000|391|47|110|422|1328|46|391|
|20000|421|47|140|516|1734|62|516|
|25000|531|62|156|640|2016|63|703|
|30000|579|62|187|782|2469|266|859|
|35000|640|203|203|719|2828|282|922|
|40000|828|109|390|782|3125|344|968|
|45000|843|110|453|891|3484|125|1062|
|50000|921|125|375|1031|3813|984|1141|

h3. Other Data

* MySQL displays very low CPU utilisation (~2% for the benchmark application and 5% for the mysql server) which probably indicates that the bottleneck is the client-server I/O (which is to be expected).
* One$DB and HSQLDB had a very high CPU utilisation (>95% measured using the Windows XP Task Manager). Again, this was expected since the databases are integrated into the application and the only bottleneck is how fast it can process the data.
* Derby had a lower CPU usage than the other embedded databases (~40-70%)

h3. Graphs

_Click on the graphs for a sharper image_

!/notebook/images/20t.jpg (Average Times for Embedded Databases)!:/notebook/images/20.png
The average times (in milliseconds) for each embedded database to perform an operation involving 2000 rows

!/notebook/images/17.png (INSERT times for 2000 rows)!
The time to complete an INSERT of 2000 rows, performed in 10 consecutive runs. Notice how the first run is significantly slower.

!/notebook/images/18t.jpg (INSERT times for all databases)!:/notebook/images/18.png
The INSERT time for an increasing number of rows. MySQL is disproportionately slow in this test and obscures the performance of the embedded databases.

!/notebook/images/16t.jpg (INSERT times for the Embedded Database)!:/notebook/images/16.png
The INSERT times for an increasing number of rows. HSQLDB, whether SHUTDOWN or not, is clearly the fastest database here.

!/notebook/images/19t.jpg (The SELECT times for all databases)!:/notebook/images/19.png
The SELECT times (in milliseconds) for an increasing number of rows. HSQLDB, when SHUTDOWN properly after each SELECT operation, is cleary the slowest database. Although, it can still retrieve 50,000 rows in less than 4 seconds.

!/notebook/images/15t.jpg (Drop/Create Times)!:/notebook/images/15.png
The time (in milliseconds) to DROP and re-CREATE a table. The number of rows is actually the number of rows which are going to be INSERTed and SELECTed. The number of rows in the table that is DROPped is row-5000 (the supplied increment variable).

h2. Analysis

* All of the databases appear to scale linearly although it is not possible to be sure with the experiments performed here.
* All of the databases could probably be improved with specific optimisations
* Closing the database connection between each operation is obviously a pretty unrealistic scenario. However, the only database which seems to be adversely affected by this is HSQLDB when SHUTDOWN. In all the other cases,
* The first benchmark run is always a little slow which is almost certainly due to the JVM’s Just-In-Time compilation.
* MySQL has awful INSERT times compared to the embedded databases
* MySQL is fairly efficient for row retrieval, keeping up with the embedded databases.
* HSQLDB is ridiculously fast for everything (INSERTs, SELECTs, DROP/CREATEs) and it appears to scale better than any of the others.
* The DROP/CREATE times are not a terribly important benchmark but it does show that HSQLDB, when SHUTDOWN properly, is the only database whose times do not remain constant regardless of the number of rows existing in the table.
* HSQLDB, when it is not SHUTDOWN properly, has almost no table DROP/CREATE overhead.
* HSQLDB is generally slower when it is actually shutdown properly (using SHUTDOWN) and it would appear that the main bottleneck with HSQLDB is the SHUTDOWN process
* HSQLBD doesn’t seem to exhibit any adverse effects if it is not shutdown properly, which is as you’d expect from a good database. Of course, you are left to wonder, if HSQLDB does suffer adverse effect when not SHUTDOWN properly, why incur the speed penalty?
* Derby is slower than One$DB for INSERTs but has better SELECT performance
* Most of the databases are optimised for SELECT with the exception of HSQLDB which has extremely fast INSERTs and only mediocre SELECTs.
* Whilst I wouldn’t put too much stock in the exact figures, it is interesting that *HSQLDB takes less than 4 seconds to INSERT 50,000 rows, whilst MySQL takes over 20mins!*
* I am suspicious that MySQL’s INSERT performance is so poor compared to its SELECT performance. I believe this could possibly be a configuration problem on my part, although I have no idea what that might be.

h2. Conclusion

Choosing a database is not just about performance — but that is usually one of the criteria. It seems fairly clear that any of the embedded databases will have a significant performance advantage over MySQL.

For me, the winner is HSQLDB, particularly due to its wonderfully fast INSERTs performance, which will be the bulk of the work required from the database. The SELECT performance would also appear to be excellant when the SHUTDOWN times are excluded. Since MySQL possesses useful GIS support, and better integration with external reporting tools, I won’t be completely leaving it behind. My intention is to use HSQLDB during the simulation runs for input and output data, but then load the data into MySQL when performing the analysis of the experiments.

I’d urge anyone choosing between databases to run their own benchmarks and don’t rely on my results and interpretations. Feel free to download my “Benchmarking suite”:/research/dbBenchmark.zip and modify it to your own requirements. I won’t dignify it by slapping an open-source licence on it but if you make any significant changes I’d be interested to hear about it in the comments below.

*Update*: Despite my initial mistake in using HSQLDB in-memory tables, the CACHED version is still faster than the other contenders and the SELECT performance doesn’t suffer from the final write-to-disk bottleneck of in-memory tables. Other people have noted that MySQL would be much faster is the INSERTs were performed inside a single transaction. This is quite true but I have to migrate away from MySQL and it made sense to benchmark MySQL is the way that I currently use it. There’s “a thread running on JavaLobby about this comparision”:http://www.javalobby.org/java/forums/m91827199.

48 Comments

48 comments on “Performance Benchmarking of Embedded Databases”

  1. aaa says:
    March 4, 2005 at 12:27 pm

    Hello, intresting comparison. One of my friend had similar results comparing MySQL versus HSQLDB. However, you should note that using HSQLDB in memory is not a good choice if your DB size is high. this not only sucks system resources, degrades DB startup performance too. Maybe you should make a test with “CACHED TABLES” for HSQLDB, in which you can go until 2GB of DB size with limited memory use. nice article anyway.

  2. jamie says:
    March 7, 2005 at 7:03 am

    your wish is my command ;-)

    I’d purposly avoided reading the documentation (too much) because I didn’t want to apply DB-specific optimisations. However, I’ve added the results for CACHED tables and CACHED table with a proper SHUTDOWN. This seems like a better comparison with the other DBs but HSQLDB still wins for my situation.

    Thanks for your comment!

  3. Andrew Chandler says:
    March 7, 2005 at 10:15 am

    Hi Jamie – quick note, you wonder about the “why incur the overhead of the shutdown”. The main thing is that if you don’t do a shutdown then when you actually restart the jvm and open the database for the first time it has to load the cached table and then replay the transaction log to get an up to date picture of data prior to the database going offline. “shutdown” will empty the log and finalize the cache state on the shutdown end….basically you incur the overhead, its just whether you want to incurr it at startup or shutdown. For us we do it at shutdown time because it makes our app initialize faster giving the “impression” of responsiveness. Also the sql “shutdown compact” will take even longer but removes dead space and optimizes the cached data file.

  4. Dima says:
    March 7, 2005 at 10:44 am

    One thing to remember about embedded databases (and HSQLDB in particular) is: how much smarts can you put in ~200K of code?

    If you benchmark complex queries (joins with subselects, IN clauses, etc.) you’ll find that HSQLDB’s weak spot is its query optimizer.

  5. Wiliam E. Taylor says:
    March 7, 2005 at 11:46 am

    Hello…

    I use HSQLDB because of the performance as you found out in your benchmark.

    It’s a shame that the newer product version of HSQLDB, HXSQL (HyperXtremeSQL) at hxsql.com, does not meet your DB testing requirements. It’s not free and not open source but shares the front end and utilities with HSQLDB, presently. The HSQLDB people are running low on contributions and this product version could help resolve this financial risk in supporting HSQLDB. It is a very new product version of HSQLDB.

    For my product “JHackerAppManager”, that will be out soon, requires HSXQL over HSQLDB. I have to split my product into a free and pro version. The first uses HSQLDB and the second HXSQL because of cost for HXSQL and has more features. I have not created a website for it yet. Have a non professional temp site, http://home.ptd.net/~weconsul/, with couple JPG’s for it and changes I’ve made to the HSQLDB DatabaseManagerSwing class recently. My product is a front end auditing tool for Windows XP SP 2 Firewall. It evolved out of my own requirements and also my financial needs. Oh, I’m a “On-the-bench” consultant, presently. Sorry, I’m not here to push my product.

    Anyway, I do a lot of row processing and HXSQL is between 100% to 300% times faster than HSQLDB based on type of processing. I use cached tables. I found that you did not benchmark HSQLDB’s text based database abilities. That is a shame because it would be interesting to compare this to other databases. Guess it would take more research into HSQLDB and the other database do not have this ability.

    Basically HXSQL makes HSQL look like it is running a race with metal boots on, compared to HSQL. It would be nice for you to contact Fred Toussi, the HSQLDB maintainer, at fredt@users.sourceforge.net to talk to him about bench marking it. This would help them out and hopefully, from you findings, if mentioned in your research and articles, get the word out. I don’t think people know about it. It’s just that it makes HSQLDB look like it is standing still. If you contact Fred mention my name he does know me by handle IchBin or Ed Taylor.

    Thanks in Advance…
    IchBin
    __

    ‘The meeting of two personalities is like the contact of two chemical substances: if there is any reaction, both are transformed.’ – Carl Gustav Jung, (1875-1961), psychiatrist and psychologist

  6. D.Lopez says:
    March 8, 2005 at 12:31 am

    Hey there,
    Nice article and good luck with your project.

    Just a small suggestion, it might be good if you put the “units” in the results comparision, to make it clear what you are talking about.
    For example, first result is that with 5000 rows MySQL -> 532…. and one has to wonder… 532 what? Secs? Millisecs? 532 times in a given time? Is it better if the number is high or is it better if it’w low?… ;) . Just a tiny detail to give some meaning to the numbers.

    As an interesting bonus, it might be interesting to see how difficult were the different DBs to set up, if they force you to write static paths, etc, to see a bit more than just raw performance.

    In any case, thanks for sharing the results.

  7. jamie says:
    March 8, 2005 at 2:25 am

    Yeah, I should have mentioned more consistently that the times are in milliseconds! So, the results are the time, in milliseconds, to complete and INSERT/SELECT for x number of rows. The DROP/CREATE times aren’t terribly interesting or useful.

    The average times graph is actually the (average of (the time taken for each operation on X rows / X rows)). So the vertical axis in that graph is the average time taken for a database to perform an operation on 1 row (INSERT a row, SELECT a row).

    Like I’ve said, I don’t put too much stock in the actual times but rather the relationships between them.

    With regards to DB configuration, One$DB was probably the worst database in this respect (and was one reason I haven’t chosen it) as it relied on an environment variable or system property to specify the directory where databases are located. Derby and HSQLDB both allow you to specify local paths. Derby creates a directory (with sub dirs and files) with the name of your database whereas HSQLDB creates up to 4 files with the name of the database (e.g. database.{script, properties, data, backup})

  8. David says:
    March 8, 2005 at 3:09 am

    you implements your own benchmark-tool ?

    try JMETER
    http://jakarta.apache.org/jmeter/usermanual/build-db-test-plan.html

    cheers
    David

  9. jamie says:
    March 8, 2005 at 3:29 am

    mmm… I didn’t realise JMeter could benchmark databases! I thought it was just for servers.

    Still, by writing my own tool I had the oppurtunity to familiarise myself with each database.

  10. Gerry Power says:
    March 8, 2005 at 6:41 am

    If speed is your need, you might want to check out Prevayler @ http://www.prevayler.org/wiki.jsp. Based on my performance test, it is orders of magnitude faster for selects.

    Although Prevayler is an in memory store, the default HSQLDB is in memory as well, so mem usuage will be roughly the same.

    The equivalent of insert in Prevayler is 2-10 times slower, as it takes great pains to ensure no data loss, unlike HSQL. HSQL can easily lose data if it is not shutdown properly.

    The other thing to mention about Prevayler is that it does not support SQL / JDBC syntax, which at first blush would seem like a problem, but as a Java guy, it turned out to be the best thing. In an hour I created a persistent Map w/ Prevayler, and now I don’t think about DB persistence anymore, I just throw my objects I need persisted into the Map, and voila, they are there next time my app runs. Makes persistence a no-brainer.

    Good luck,
    Gerry

  11. Jasper says:
    March 9, 2005 at 3:32 am

    I did some similar benchmarks about 6 months ago looking for a free database that I could embed into a commercial java application. Hsqldb is by far the fastest till you get much more data than you have ram or try very complex queries. I have not tried the new commercial extreme version. Firebird is a notable omission from your benchmark as it’s very easily embedded in a java application (http://firebird.sourceforge.net/) but is our tests the performance wasn’t great. Our final choice was Posgresql 8. It’s not designed to be an embedded database but with few lines of code it can be started in the background when your java application starts and stopped at the end. It starts up in a couple seconds so that is not an issue. If you need a full featured database with full transaction support and complex queries then there is no comparison. We are doing 10 way joins across tables with 100,000 plus rows, total db size 500Mb plus: in 1 or 2 seconds, most of the others cracked under that sort of queries, with query times will over 30 seconds and some in the tens of minutes.

  12. Leonard says:
    March 9, 2005 at 7:37 am

    What about MckoiDB?
    Nice open source java database.
    http://mckoi.com/database/

  13. matt says:
    March 9, 2005 at 9:11 am

    Don’t even waste your time on Prevayler!! Really..

  14. Dave Henry says:
    March 9, 2005 at 9:48 am

    Thanks for the interesting article, benchmarking is always a tricky subject.

    I think you have to be clear what you are looking for in a database (performance, single/multi-user, resilience etc) as performance is not always the most important consideration. If you only have a few 10,000s of records, does it matter that one database takes 0.5s and another takes 0.1s to do a query? If you deal with millions of rows with complex sub queries then performance is much more important.

    I recently carried out an evaluation of open source databases against my specific requirements. I found that when I took all the other factors into account that Derby was fast enough for my purposes and it was things like encrypted data files, serialised transaction support, robust recovery after power failure which were more important to me.

    Hsqldb could be expected to be faster as it only provides READ_UNCOMMITTED transactions. I think the other databases tested all provide various levels of transaction isolation (i.e. other transactions can’t be seen by the current statement until they are committed). I would guess these other database would be faster if running in the same transaction mode.
    As with all benchmarks they are a useful guide but you really have to do your own testing with your own data/worst queries etc to get a view on how it would work for you.

  15. Henry Story says:
    March 9, 2005 at 12:17 pm

    It would be interesting to compare your results to an embedded Semantic Web database. I am currently using Sesame for my work on BlogEd. It acts as an ebedded database when using their native Sail. (One could get Sesame down to 300KB with just the native sail, btw)

    One could test the speed for inserting triples,
    deleting triples just as in your research.

  16. patrick peck says:
    March 9, 2005 at 1:21 pm

    I think your benchmark concerning insert performance is flawed. Any decent implementation would use batching of insert statements. This improves performance by an order of magnitude. I suggest you check out http://forum.java.sun.com/thread.jspa?threadID=532257&messageID=2567772 and rerun a modified version of the benchmarks.

  17. jamie says:
    March 10, 2005 at 2:23 am

    Patrick, I’m not sure that I’d agree with your assertion that because I don’t batch the inserts then my benchmark is flawed. However, thanks for pointing out this functionality! As I said, I’m not really a DB expert but I had assumed that, if I looked hard enough, I’d have found optimisations such as batching. As a very simple benchmark, I wanted to avoid too many optimisations and see what the “default” performance of each database was.

    I will have to see which optimisations I will apply in my simulation. To date, the most commonly mentioned optimisations are turning off automatic transactions, turning off index updates and batching insets.

  18. jamie says:
    March 10, 2005 at 2:26 am

    BTW, I’m delighted with all the feedback that I’ve received to this article (both positive and negative).

    The purpose of this article was simply to say “I did this… and these are the results I recorded”. I’m not trying to make grand universal claims about database performance or tell you which database to choose. Other people have performed similar benchmarks, as part of their own decision processes, but I felt it would be beneficial to all (including me!) if I wrote up the article and published it (doesn’t the Hacker Ethic include “the belief that information-sharing is a powerful positive good”! :-) So far, I’ve had plenty of good comments: some people found the article useful, others have pushed back on my method of benchmarking, and still others have suggested optimisations and alternative databases.

    Thanks to all!

  19. Frank says:
    March 11, 2005 at 4:02 am

    I really liked your basic comparison of embedded databases! It gives a simple overview of the databases and it is open for enhancements.

    I have been interested in Sqlite and Hsqldb for a period, and your tests was just what I was looking for to get a speed comparison.

    I downloaded you code, and made a few tests of my own: – I added sqlite (3.0.8 – I haven’t built the driver for the newest version yet) – I added Mckoi (1.0.3) after seeing a reference for it here. – I added the new release candidate of hsqldb (1.8 rc9) – I changed the load tests to be from 50000 rows to 500000 rows.

    Here are a few observations:
    – In Sqlite and Mckoi autoCommit must be set to false, which seems reasonably enough.
    – The newest release of hsqldb (1.8 rc9) seems to be 20% – 30% faster than 1.7.3
    – Sqlite seems to be 40% faster (for inserts) than hsqldb (the shutdown test – I’m not really sure which one to compare it with) and much faster for the selects.
    – For the load tests hsqldb throws an out-of-memory exception after 200,000-300,000 inserts, while sqlite just keep running all the way to 500,000. I can probably be tuned/configured but an interesting observation.
    – Mckoi is really really slow (a factor 30 for inserts compared to sqlite). But is has a much better support for SQL than sqlite.

    If you are interested in the updated source and spreadsheet send me a mail.

  20. jamie says:
    March 11, 2005 at 5:44 am

    Thanks for the update, Frank!

    I’ve realised that 50,000 rows isn’t that much of a stretch for some databases and, in response to your comment, I’ve also just tested HSQL (1.7.3) with your scenario. You’re right, OutOfMemory error at 350-400,000 rows. This occurs during the select test and is a known problem (the whole resultset from the select must fit into memory). This can be resolved by limiting the number selects returned. Derby was able to to the insert/select test up to 550,000 rows (and probably beyond), but it did take 9mins to insert this amount.

    I hadn’t tested SQLite because I wanted a pure Java solution although you seem to have got very good results for something that requires a native interface (I’m impressed). I can’t remember why I didn’t include McKoi

    It’s also worth noting that HSQL v1.8 can now support databases of up to 8GB, up from the previous limit of 2GB.

  21. Nishant Saini says:
    March 11, 2005 at 6:09 am

    Nice comparison. It seems to be very useful if we have to choose an open source database to embed in our application. But it could include more open source databases like Mckoi DB.
    Anyway thanks for useful test carried out,

  22. Levent Gurses says:
    March 11, 2005 at 2:21 pm

    Good work! Nicely detailed and explained. As a longtime MySQL user I was suprised to hear that ”…HSQLDB takes less than 4 seconds to INSERT 50,000 rows, whilst MySQL takes over 20mins”. Now I am wondering about the possible areas of application of these embedded databases. Could they save time for smaller applications by providing a configuration-free database capability? Where and how? How would that compare to XML databases? Just a few questions your research sparked in my mind…

  23. Mike Cherichetti says:
    March 11, 2005 at 6:17 pm

    I’m surprised by those MySQL results, they seem really high… did you forget to turn off your virus scanner? :)

  24. gangadhar says:
    March 11, 2005 at 11:55 pm

    whats the default isolation level that the dbs are running at.. that might affect performance if one db is at read uncommitted versus another database is running at read committed.

    Do transaction do what they are supposed to do, does a commit actually flush the log to disk..
    if not, transactions may not be fail safe :(

  25. manuel says:
    March 13, 2005 at 5:47 am

    concerning batching of jdbc statements: First I was shocked as I saw your results for mysql. Last year I did some performance tests regarding insert speed of RDBS (oracle, SQL Server, SP DB, mySQL). mySQL came out as fastest. I use it for doing mass inserts in our application (more than 100000 rows per transaction). We are using batch statements and our performance is more than 20 times better. What you are testing in this scenario is rpc-speed between your two processes (mysql and your app – more of an os benchmark) than doing a DB benchmark. However I would not say that the benchmark is totally flawed – but concerning the comparison with an RDBMS like mySQL maybe you should give an explanantion for the unexperienced user or maybe you should add some data with varying batch sizes.

    However – nice work – next time I will have some embedded DB on my radar.

  26. brush says:
    March 14, 2005 at 6:52 am

    I think that the problem with MySQL here is that it’s running in autocommit mode and has to synchronize data to disk on every insert. To give it a bit more fair conditions you should wrap your thousands of inserts into a single transaction (begin/commit). Otherwise you are not realy benchmarking
    mysql vs java db but rather CPU to MEM vs CPU to DISK performance.

  27. Tzvika says:
    March 15, 2005 at 5:28 am

    Hi there

    Thanks for the interesting benchmark. I made a slightly less serious effort myself and got at about the same results. HSQLDB has miracle performance.

  28. Manoj Kr. Sheoran says:
    March 15, 2005 at 8:15 pm

    Hi Folks ,
    Thanks for intersting comparision.Me too made a serious benchmark to test dql operations (that’s a most important part) and found DaffodilDB 4.0 his a very good performance .

  29. René says:
    March 16, 2005 at 1:41 am

    Hi there,
    I would also be interested if HSQLDB was run totally in memory during the tests because if so, I have to follow brush’s argumentation. I’m pretty much in the same situation, writing an agent-based simulation that needs a database. I have also found simpleSQL (www.simplesql.org), a free database that is really leightweight. I’m not sure if it qualifies for your tests but if anybody has experience with it, please drop me a note.

  30. Michael Kimber says:
    March 18, 2005 at 12:56 am

    An interesting study and one that has made me want to find out more about the potential applications of Embedded Databases (OODB, RDBMS ISAM etc). Outside of google.com can you recommend any Websites of literature that get me up to speed in this area?

    Regards Mike Kimber

  31. Jesper Söderlund says:
    March 22, 2005 at 2:13 pm

    After reading your interesting article I’ve had an interesting discussion with my collegues around the ACIDness of HSQLDB .

    It seems that HSQLDB doesn’t adhere to the ACID properties fully, which is why it can seriously out-perform the other databases that you tested (try SET WRITE_DELAY FALSE and you will get a very different performance behavior).

    For certain types of applications this might be an concious choice that you want to make but in this regard it’s not a totally fair comparisson between the different DBs. In the application which I had in mind I think it’s a trade off I’m willing to make.

    It might be worth mentioning this in your article.

  32. Colin Zhao says:
    March 25, 2005 at 9:35 am

    Mamie:

    Very interesting numbers. I wonder what number you will get from oracle.

    I have been working on a little package called portablerulebox on sourceforge lately. My intention is to use the hsqldb in-memory mode to let applications cache tables from a remote database, so that application sql queries (read only no updates) can be directed to the memory copy of the tables. This is, of cause, only for caching static rule tables with number of rows in tens of thousands. I was expecting to see a performance boost among other adventages.

    I am almost done with all the coding, but the testing result is disappointing. I create three tables in Oracle with 50, 500 and 25000 rows each. The last one is an association table of the first two. My test query is a join between the first and third tables. I run the test on my home XP machine with the oracle database installed on it. The test compares the speed of the same query against tables in Oracle (remote query) and their cached copy in hsqldb (local query). The remote queries complete consistantly in 16 ms. The local queries range from 31 to 78 ms.

    The code replicate the tables with primary key and all indexes. I don’t see anything I didn’t do to cause this, other than that hsqldb is slower than oracle (java vs C?).

    I found this thread in google. Maybe you can help me configure why or reach some conclusion. Thanks.

    Colin

  33. Colin Zhao says:
    March 25, 2005 at 12:16 pm

    I figured out why my numbers were puzzling. I did’t include the fetching of rows returned from the queries when I count the time. So my numbers included only the pattern matching part of the query. No wonder Oracle comes back faster. When I include the fetching, my package using hsqldb will give us a significant improvement 40 vs 250 ms for 500 rows on my machine. For returns of rows in the tens, there is not much difference.

  34. Carl Rosenberger says:
    March 31, 2005 at 1:53 pm

    You may like to find this new SourceForge project interesting:
    http://www.polepos.org

    The tests seem to go beyond what you have tested, more into the direction of benchmarking objects and also into join performance.

    The framework is fully automated and renders the graphics by itself, so you don’t have to go through the hassle to creating them with Excel.

    Best,
    Carl

  35. Aditya says:
    April 14, 2005 at 4:06 am

    The mySQL insert issue seems to be due to commits on disk. Since you are interested in only in-memory database you can allow mySQL to commit in memory by setting a parameter innodb_flush_log_at_trx_commit=0.

    We achieved mySQL insert rate increase from 50 to 700 by doing this.

  36. Harry says:
    April 16, 2005 at 10:18 am

    I am using Hsqldb with Roller and seems to work fine on a fairly slow pc.
    However if you want really fast try Kdb (www.kx.com). It has an sql and jdbc interface but is really meant for K users. It is certainly worth a visit.

  37. Josh Chamas says:
    May 2, 2005 at 11:19 am

    Those using MySQL can usually get many thousands of inserts per second. I have seen over one hundred thousand per second too.

    Here are some tips to this effect:

    1) Set cache buffers higher like key_buffer_size and innodb_buffer_pool_size as these default to 8M each. For a 1G system, shared with your application, it would be typical to allocate these buffers to over 100M each.

    2) For InnoDB, use transactions. For either MyISAM, or InnoDB, using batch inserts can be fast where many rows are inserted at one go. Syntax for this is INSERT INTO TABLE (COLS) VALUES (ROW1), (ROW2), …, (ROWN)

    3) The prior poster comment of setting innodb_flush_log_at_trx_commit=0, or =2 is a nice performance boost for InnoDB tables

    Cheers!

  38. Sylvain says:
    July 13, 2005 at 10:32 am

    Running on Mac OS X:
    The benchmarks can be run on Mac OS X with little modifications (install Netbeans 4.1 for OS X first, if you have not done so already).

    1. In Netbeans, open the dbBenchmark project Jamie provided and build it. This will create a dbBenchmark jar file in a dist folder. No changes are necessary (I tested on OS X 10.4.2).
    2. In the Terminal window run the command scripts that are provided in the .bat files, but after a couple of corrections: A. Change the ”;” to ”:” in all classpath comands. The ”;” is illegal for java on Unix. You must put ”:” between jar files. For example,
    -classpath one.jar;two.jar
    will become
    -classpath one.jar:two.jar

    B. Put the JDBC driver command between quotes because the ”;” in blahblahblah;create=true is interpreted as a new command by the Unix shell. For example, -d jdbc:daffodilDB_embedded:daffodiltest;create=true -u
    becomes:
    -d “jdbc:daffodilDB_embedded:daffodiltest;create=true” -u

    With those changes the tests will run. There are sometimes warnings when the tables have to be created the first time.
    Also the benchmark.xls external data import is virtually unusable in Excel on the Mac. That’s a Microsoft problem: Excel does not tell you which file it wants when asking for each external data file. I moved the files to a PC to manually re-import the data.

    I have not tested the mysql database yet.

  39. Peter Kovgan says:
    July 13, 2005 at 11:52 pm

    Great work!
    I just in process of comparison of PointBase and Derby and last conclusion is
    “Derby is not recommended due to speed degradation starting from certain DB size”
    “Derby’s INSERT is also much more slower then PB”

  40. Peter Kovgan says:
    July 31, 2005 at 5:37 am

    Previous message is not true. Insert speed is even better on Derby if both DB commit every transaction.
    Derby also has not speed degradation if field after WHERE clause is indexed.

  41. James Cowan says:
    August 10, 2005 at 7:48 am

    On the issue of hsqldb’s memory usage, there seems to be 2 issues. Using cached tables means you reduce amount of ram used to store data (it gets copied to disk) but selecting data still copies the entire result set into ram. Is that right?

    Does derby and the derby jdbc driver store the entire result set in memory as well?

  42. dan says:
    October 5, 2005 at 1:35 pm

    Hi,
    we have a fairly db-intensive java server app which normally runs 24×7 and can have Gb-sized dbs. We have used HSQLDB and McKoidb as embedded databases, and are now thinking of migrating (Derby or Daffodil). Why?
    Because from real life experience we’ve seen that neither HSQLDB or McKoi are stable. They both suffer db corruption (and even beyond repair).
    HSQLDB is very fast, but not stable. It is so fast because when you run an INSERT, it simply writes the INSERT statement to a file (script). Then when you shutdown it recalculates all the tables by running all the queries in the script to a table. So the shutdown times can be huge if you use it in a long-running application. And if the computer loses power during shutdown, the db is corrupted and unrecoverable – by own admission of the main author!. And startup times for big databases are just too big. HSQLDB may be enough for small dbs, but there is a serious drawback for the fast performance you get…
    McKoi is a bit more stable, but still we’ve seen too many customers with corrupted DB to recommend it to anyone. Example: if the disk runs out of space on windows, and you do inserts or deletes, the db will get corrupted. Verified heaps of times. And this is not the only case, we’ve seen too many ‘Area pointer not allocated’ errors over time. Also see mckoi’s support forum.
    It is easy to benchmark databases with some scripts, but in my opinion performance comparison is meaningless if the first key requirement of a database is not met: to reliably store data. And unfortunately there is far less information available on how good these db are in this respect. We’ve had to learn it the hard way. Just wanted to share the experience.

  43. Fred Toussi says:
    November 11, 2005 at 8:43 am

    Regarding the comments above by “dan” a.k.a. no-one at nowhere.com, it is not the case that a power loss during shutdown would automatically result in a corrupted and unrecoverable database. The only known case involves a missing .properties file that can be replaced by the user and we have advised our users to do so.

    The shutdown times are relatively long for gigabyte databases because a full backup of the .data file is made, not because the queries are rerun. This backup ensures a much better chance of recovery if there is a power failure while the database is operational.

    Fred Toussi
    hsqldb.org

  44. Rahul says:
    March 18, 2006 at 11:34 pm

    Hi,

    Good comparison of databases. I am specially interested in HSQLDB and appreciate if you could comment on this.

    I am doing load testing on JBOss with inbuilt HSQL database as well as standalone hsql server.

    1. How many number of concurrent connections hsql can handle?
    2. If we apply 500 messages/sec to insert in a table, can hsql handle this? If not, what is the limit?
    3. How to do backup of hsql database without stopping jboss/hsql
    4. Can you suggest any good database connection pooling that can be used for hsql? I am using dbcp, but facing below error.

    On applying more than 500 messages/sec over long time (24 hours), memory increases gradually resulting in outof memory error (before it reaches data file size limit of 8 GB)

    I am not sure if this is bug in hsqldb.

    I will share my load test results when ready. Your feedback would be of great help.

    Thanks,
    Rahul
    (rahulap18 at gmail dot com)

  45. Jan says:
    April 2, 2006 at 5:04 pm

    Thanks mate! I quoted your comparison in my report about RDBMS – hope you dont mind :-) Good luck with PhD!
    Cheers!
    Jan

  46. Peter Zaitsev says:
    April 10, 2006 at 3:11 pm

    Hi,

    You have not provided information about MySQL configuration as well as what storage engine you’re using.

    As you’re mentioning 2% CPU usage for MySQL I strongly suspect you was running Innodb in ACID mode while candidates were not configured ACID. If you’re doing single row inserts in separate transactions from single thread you have to be limited by the disk in such cases.

  47. westant says:
    July 3, 2006 at 10:40 pm

    hello, i have two questions,
    1) can you get a stable data when you run the same cases against to the same product? in my testing, i always hit unstable problem. the data varies much from run to run.
    2) did you see the impact of java garbage collections? in my testing, the operation will take 100-1000 ms while java garbage collection will take 10-50 ms every time.
    thanks..

  48. Jamie Lawrence says:
    November 13, 2006 at 3:21 pm

    Due to excessive spam, the comments are now closed. This article has really run its course now and is 18 months old… really you need to run your own benchmarks now.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Contact Me

  • jamie@ideasasylum.com
  • ideasasylum
    • Twitter

Categories

  • BalanceTrackr (2)
  • C# (1)
  • dualpricing.ie (2)
  • hack (1)
  • Java (4)
  • Linux (3)
  • Measure it to change it (1)
  • Parenting (2)
  • Personal (36)
  • Photography (13)
  • Projects (7)
  • Python (3)
  • rant (17)
  • Reviews (20)
  • Ruby on Rails (16)
  • Taskmonifier (1)
  • Tech (1)
  • Tips & Tricks (7)
  • tuesdaypush (1)
  • Uncategorized (877)
  • Useful (3)
  • Weekly Picks (5)
  • Work (2)

Tags

accident barcampcork batteries bizcamp business car copyright depression diy dual pricing eneloop fowadublin fun gallstones git svn hack health ipad iphone lens life material motivation nginx Personal phd photographers photography photos pinterest podcasts quote rails rant renault scenic review rspec ruby sew shutterscouts startups swimming vagrant windows wrap

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

(c) 2012 Jamie's Weblog