Note: The Cinchapi Data Platform is powered by the Concourse Database, an open source project founded by Cinchapi CEO, Jeff Nelson.
Concourse is designed to be low maintenance and programmer-friendly, so we spend a lot of time building features that automate or remove traditional database tasks that detract from actual app development. One such feature is that Concourse automatically creates secondary indexes for all your data, so you can perform efficient predicate, range, and search queries on anything at anytime.
The motivation to index everything comes from the fact that deciding what to index is annoying, high maintenance and complicated. I’m sure you’ll agree if you’ve ever been bitten by a performance bug caused by forgetting to (or not knowing to) index certain columns in a table. While, you certainly do need some indexes for your app to perform well at scale, being forced to do query analysis and constantly tune your index design to get it right is undesirable.
I’m fully aware that the conventional wisdom says you shouldn’t index everything because extraneous indices take up disk space, hog memory, and slow down writes. The first point is moot, since disk space is relatively “cheap”, but the last two are valid and were carefully considered when building this feature.
Even though Concourse indexes everything, writes are still fast because we use a buffered storage system that durably stores writes immediately (without any random disk I/O) and quietly indexes in the background. This system is completely transparent to the user–as soon as you write data, it is durably persisted and available for querying, even while it waits in the buffer since recently written data is cached in memory.
The buffered storage system is carefully designed to make sure that indexing data never blocks writes or sacrifices ACID consistency, even if the system crashes. So you can quickly stream data to Concourse and trust that your indexes will never become compromised.
Typically, indexes are most effective when they live in memory and don’t cause the database to page things in and out to disk. Obviously, Concourse can be expected to eventually reach a state where there is not enough memory to hold all its indexes, so there is logic to automatically evict the least recently used ones when there is memory pressure. Additionally, even if Concourse must go to disk to fetch an index that hasn’t been used in a while, we use bloom filters and metadata to minimize the amount of disk I/O necessary to query the index.
Automatically indexing data is obviously a big win for developers since they always get super fast reads without impacting write performance and without ever needing to query plan. But this is also a huge benefit to Concourse internally because it allows the storage engine to leverage tons of data for better query optimization. Java revolutionized developer productivity with managed memory and I truly think Concourse can do something similar with automatic indexing.
Originally published at concoursedb.com on June 13, 2014.