Design of BerkeleyDB Client/Sever OctopusDbm


Last weekend I worked on the design of the new modules for OctopusDbm and looks like nice features are coming in soon. The first release was built, 0.6 and soon I will start documenting the current modules and design.

Community

Any open source project needs a community or group of people that like the technology and contribute with ideas, code, testing, analysis, etc… It will be a nice experience for me to learn how to do this. So the following weeks I will start promoting the project and trying to attract lovers for this project. Spread the word if you read this, this tech can be pretty cool.

Dbm

The client/server implementation works in such a way that the code change for applications already running BerkeleyDB software local is minimal. Almost all API methods are implemented in remote server.

Curently you may have:

?View Code PYTHON
from bsddb3 import db
Dbm = db.DB()
Dbm.open(sPath, db.DB_BTREE, db.DB_CREATE)
Dbm.put('Name', 'Jorge')
Dbm.close()

And with OctopusDbm:

?View Code PYTHON
from octopusdbm import Dbm
dbm = Dbm(hostName, user, password)
dbm.open(sPath, db.DB_BTREE, db.DB_CREATE)
dbm.put('Name', 'Jorge')
dbm.close()

So you basicly can open, put, create cursor, get, get table information, etc… same as you do in local. Keep in mind that for every database methods there is a call to the server. So server connections for open, put and close. If you need better performance we have direct operations where open and close are integrated, like getDirect(), getLIstDirect(), as well as bulk operations like getList() and putList() with tables opened.

No Db Sessions

One of the design points of OctopusDbm is that you don’t need to open connections to database or create sessions before processing operations in database. The idea is that getting and putting data is as simple as getting an HTML page, so simply a request is performed and you get result. This way is easier to connect to 10 or 20 cloud images without pool of connection, etc… Imagine having a pool of 3 connections to 100 cloud images, that would be 300 database connection to manage for every node needing database connectivity to the cloud.

This is of course open to discussion, so place your ideas, etc…

DbmCloud

We introduce this object for access to the cloud. The distribution has been designed so far to e horizontal. That is, table 1 goes to cloud image 1, table 2 to cloud image2, …., table 11 to cloud image 1 when having a cloud with 10 images or virtual machines. This object also manages partitions in horizontal way and while you have an integer primary key or index. So tables being in the cloud must have integers if you want to partition them. If they have string indexes they will not be partitioned so far. I think this is ok for first versions. You can allways pass this DbmCloud, implement your own, participate in the project with your own dessign of cloud distributions and partitions, etc…

It has been decided so far to have an XML file with all schemas, tables, defining number of partitions, number of cloud images organized by domain and application name. For very big systems, they could have different domains and application inside each one with a cloud design. Small apps would have 1 domain and 1 app.

One cool aspect of DbmCloud is that any database object, that is, data table, index file, foreign key will be distributed and queries running on them executed in different cloud images therefore boosting performance, even more when smart partitioning in place.

Models

In order for OctopusDbm be an alternative to SQL databases like MySQL we need a model design that will:

  • Allow us to have tabulated data in columns
  • Design to make it easy to implement persistent subsystems like Hibernate, JPA, etc…

We thought to do this inside Python similar to Django models, where you define class attributes with classes relative to types and using dictionary class attributes:

?View Code PYTHON
oClient = Client(Name='Jorge', Place='Madrid')
oClient.save()
 
oClient = Client.get(Name='Jorge')
oClient.Place = 'Madrid'
oClient.save()

First one creates an object in database system and writes to database in a lazy mode. Second one updates the client object in database. I think the models would support cloud distribution and partitions. You define in the class the attribute types, index names, physical names, etc,….

The primary key in tables can be an id or other value. There will be a type IdGenerator that will generate auto ids for rows, but you could define you own primary key values. When support for cloud and partitions we need an integer primary key for partitions.

First versions will probably lack foreign keys, but later on we can implement foreign keys with model integrity. I thought of Links between tables besides ForeignKey types to execute soft links for example to get literal of an id in a parametric table.

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Digg Digg This Post Post to Ping.fm Ping This Post

,

blog comments powered by Disqus