Python Faster Than Java, At Least For BerkeleyDB


I tested new Oracle Java edition for Berkeley DB and for my tests the word that best describes the product is FAIL. The conslussion more or less is that Oracle screwed Berkeley software. They have no HASH, only BTREES, many functionality of BerkeleyDB has been removed, etc… They say is BerkeleyDB with Java flavour, they have a weird sense of java humor then.

I was expecting factor of 5X as many blogs suggested that on real applications we have this improvement. And since BeerkeleyDB for me has a lot of memory use and processing as well I was expecting at least 2X factor. I got a 2X factor but on the contrary.

Results

I did test for creating a table of 1 million rows with 128MB memory Cache.

Language Time
Python 12 sec
Java 28 sec

So the refactoring of OctopusDbm, if any, will go through C++ inside Linux. I downloaded the Linux Tools for Eclipse and will start playing with those when I have some time.

It seems that Python with C modules work great.

[Update Feb 22 2010]

The tests were performed 5 times and get the average, plus:

  • BerkeleyDB.4.7
  • bsddb3-4.7.2 (Python API)
  • Linux SuSE 10.1 Pentium 4 Hyperthreads
  • Kernel 2.6.16
  • Sun Java 5

And here is the code used:

/**
 * Testing for BerkeleyDB
 */
package test;
 
import java.io.File;
import java.util.Date;
 
import com.sleepycat.je.*;
 
/**
 * @author Jorge Alegre
 *
 */
public class BerkeleyDB {
 
	private final static String HOME = "/index/test/";
	private final static long CACHE_SIZE = 128000000;
	private final static Integer MAX = 50000;
 
	/**
	 * @param args
	 */
	public static void main(String[] args) {
		// TODO Auto-generated method stub
 
		EnvironmentConfig envConfig = new EnvironmentConfig();
		envConfig.setAllowCreate(true);
		envConfig.setCacheSize(CACHE_SIZE);
		Environment env = new Environment(new File(HOME), envConfig);
		System.out.println("Data: " + envConfig.getCachePercent() + " " + envConfig.getCacheSize());
		DatabaseConfig dbConfig = new DatabaseConfig();
		dbConfig.setAllowCreate(true);
		long t_1 = new Date().getTime();
		System.out.println(t_1);
		Database db = env.openDatabase(null, "testing", dbConfig);
		DatabaseEntry key = new DatabaseEntry();
		DatabaseEntry data = new DatabaseEntry();
		String sData = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
		data.setData(sData.getBytes());
		for (Integer i= new Integer(0); i

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Digg Digg This Post Post to Ping.fm Ping This Post

, ,

  • Pete
    Hey Jorge,
    Could you post the part of your code that was cut off ? I recently tried setting up a berkelyDB (am very new to it)å and editing their sample je.gettingStarted.ExampleDatabasePut code to (I thought) simply build a database with 1M rows was taking forever. Could you post the rest of your example? I'm still sorting out what I've done wrong. Thanks a lot !!
  • Greg,

    Thanks for your feedback. I have included the Java code in case anything caused the bad results, would be glad to hear about optimizations.

    I am developing an open source middle ware solution client/server to connect to BerkeleyDB (OctopusDbm) and in paralel I am Founder and CEO of Buscaplus, an Internet search engine with so far like 1TB in BerkeleyDB databases, so I am kind of Berkeley lover. So I am a lover but also my requirements are very strict since I need lots of processing my little economic resources.

    As part of OctopusDbm, which will be the cloud solution for Buscaplus to grow to millions of web pages and many TBs in size, I have to make a decision weather to keep the current Python client/server or move the server to something else, mainly in part since Octopus will have stored procedures in code or any other smart solution, and Python we all know is slow compared to C and Java. So I started to think about it some weeks ago, I was totally convinced that a pure 100% Java will do the job, and already planned for it. But did not do the job even for test #1, so my deception was very big.

    Take the chance to ask you some questions:
    - Which language would you choose to implement a server? Agree with me that C++ inside Linux?
    - Is it possible to force BerkeleyDB JE to physically write to the file you define in the open() and not the software decide on file names and sizes of physical data files?

    So that was the purpose of this post, letting people follows me what is going on, by no means telling the Internet community that BerkeleyDB JE is a bad product, which seems is the message you got from the post. Is a nice product for certain people with Java solutions, but not me.

    Best regards
  • Hello Jorge, I'm glad you took a look at Berkeley DB Java Edition. Wow, you didn't have the reaction to the product that we'd hope developers would have. Let's find out why.

    Did you not notice the java.util.collections API? That supports HashMaps and all the other collections types as well. Yes, the underlying database is a btree, but do you really care? Btrees have always been the fastest access method in the core product, even in very large databases (PBs). Your blog post is very short and very strongly worded. If you're going to make such strong statements it might help us to know more details about your testing, your platform, etc. What version of the products did you test? It is not even clear what you mean by "Python", were you testing the Berkeley DB Python (bsddb) APIs? What version of JE? What version of DB? What version of Java, from what vendor, using what command line options, on what operating system with what file system, what hardware? What Python version? What version of bsddb? How many runs of the test did you perform? Where is the raw data? Did you calculate a standard deviation to see if the tests were highly clustered or being effected by something outside of the JVM?

    Jorge, I have been working with Berkeley DB since its inception. I have been an employee working full-time on Berkeley DB products for eight years now. That's four years pre-Oracle and four years post-Oracle. If we "failed" or we "screwed" something up or we have no idea how Java works (I was an employee of Sun Microsystems and worked in JavaSoft before joining Sleepycat) then I'd like to learn more.

    Please feel free to bring your tests out into the open and discuss them on Oracle OTN Java forum so that we can help you tune them. Maybe we can help you find that 4x (or better) performance, or maybe we'll learn more about how we've failed our community.

    thanks, please do contact me directly if you'd like to talk more,

    -greg
blog comments powered by Disqus