Avian Parallel Databases

The top-of-the-line databases are all executing highly optimized code designed to be run on the highest-performing multiple CPUs/multi-core systems. The basic architecture of database software is all similar to operating systems, with protected sections of code that handle multi-core functionality, with lots of consideration and thoughts about how to produce the desired results, data caching, data locality, preventing data staleness, and so on.

So how could any parallel database system compete with the perfection of the top-of-the-line databases? By thinking outside the box and embracing the coming kilo-core future.

In kilo-core systems, it makes sense that each of those kilo-cores would have some local RAM or extended cache. This core-specific memory would prevent huge system delays caused by all of those kilo-cores trying to access the same shared system RAM.

So imagine each core in a 1,000 core system loaded some portion of a database into its local RAM. If each core had 1 MB available local RAM, a 1GB database could all be loaded in high-speed memory at all times, providing a significant speed improvement compared to reading data off a hard disk.

Additional performance improvements would come if each core was responsible for indexing only the data held in its memory and not some shared index. The size of the indexes that each core would have to search would be significantly smaller, producing much faster searches.

Now consider that a 1GB database broken into 1,000 pieces yields data volumes on each core that are small enough that all of the data could be indexed. Imagine retrieving data from any field at indexed speeds. Suddenly all of those off-index queries, such as searching for a name in a comment field, would be completed just as quickly as indexed queries.

Now consider what happens when submitting a SQL statement in a Avian parallel database; the SQL-interpreter bird sings (broadcasts) the SQL request to all 1,000 cores which causes all 1,000 cores to start searching their local RAM for the data that matches the SQL statement. Instead of multiple threads competing for limited system resources, all of the resources are searching simultaneously for the requested data. Instead of preprocessors optimizing requests and scheduling disk reads for optimum results, the data are just searched simultaneously and the matching values are returned.

And perhaps most importantly, the data held in the cores wouldn’t have to be homogeneous. Instead of requiring all “Name” data to be in a Name table that has one exact structure that is defined in the database, the names could come from any name resource, including database tables, XML files, JSON lists, etc. Each core would only have to know how to search it’s own data. All the problems with field sizes and data consistency would all go away.

Now imagine the capabilities of a Mega-core system running a parallel database. Even with modest local RAM, we’re talking a terabyte database that could return results on any field in relatively short times. The mind boggles at the potential.

The ideas for an Avian Parallel Database presented here illustrate how using massively parallel systems should produce significantly improved performance compared to traditional databases whenever more than 1 piece of data is required. Retrieving the name of the customer with ID = 12345 would not be faster, but retrieving the names of all customers who meet certain criteria should be significantly faster because thousands (or eventually millions) of data sources could be searched simultaneously.

Leave a Reply

Your email address will not be published. Required fields are marked *