Posts Tagged aws
Using Amazon EC2 As Infrastructure For An Internet Search Engine
Posted by Jorge Alegre in Technology on June 23rd, 2009

Today I was doing testing on different Amazon EC2 images with bulk writing activities usually performed in my startup Buscaplus, an Internet search engine framework. Currently I have a set of 4 servers with SATA disks and I am planning to move to Amazon.
We use Berkeley DB as index database engine. It is pretty fast, specially if you define correctly the memory cache, etc… In Buscaplus we need to write huge amounts of data to disk and bottlenecks are often found due to the high database requirements for a search engine. So this is crucial if we ever move to amazon, speed of writing stuff to disk. A deployment and cloud design for many instances has not been accomplished but with today´s tests seems clear that Amazon EC2 is an option for Buscaplus.
Tests
Berkeley DB writes data in key-> value sets. You can select BTREE as well as other engines. We use BTREE and a cache spool of 128MB for all tests. Also, we write 100 Bytes for each row of data. The keys are simply a counter with zeros on right, like ‘0000000345′.
| Sample | index-1 | ec2 small | ec2 large | ec2 ultra large | ec2 medium | ec2 high extra large |
| 1.000.000 | 13.35 | 18.60 | 9.50 | 9.50 | 9.00 | 7.99 |
| 3.000.000 | 39.81 | 44.62 | 27.47 | 26.19 | 26.14 | 25.90 |
| 20.000.000 | – | – | – | – | – | Unstable |
index-1 shows one of the current servers. I would conclude that the “medium” instance is a great option. At only $0.20 / hour has great performance, better than current infrastructure.
I also found that when dealing with a lot of data, small instance of course are a “no-no”, but also higher instances with local disks. I noticed that when dealing with high I/O even big instances may do bad if load at that time is high. I found that this is not the case when having EBS. With high I/O and EBS I got great results all the time. So I would go for sure with EBS.
The 20 million rows tests were unstable even with a $0.80 High CPU Extra Large instance. This ended up in a DB table of more than 3GB. Read the rest of this entry »
Tweet This Post
Plurk This Post
Digg This Post
Ping This Post



Recent Comments