RethinkDB is a free and open-source scalable JSON database management system written in C++ which can be used by realtime web applications that require continuously updated query results.
RethinkDB makes use of a custom query language called ReQL which offers a way of manipulating JSON documents, and supports table joins, aggregation functions, and mixing queries with JavaScript expressions and map-reduce functions.
Potential use case of RethinkDB:
Streaming analytics applications
Multiplayer games
Social networks
RethinkDB is NOT a good choice if:
You need full ACID (Atomicity, Consistency, Isolation, Durability) support.
You’re doing deep, computationally intensive analytics.
RethinkDB data management GUI with ReQLPro
History
RethinkDB was created in 2009 and open-sourced in 2012 with the first version being an SSD-optimized storage engine for MySQL, later changed to a document DBMS similar to MongoDB. The first production-ready release of RethinkDB was in 2015, and it provided support for the JSON data model, immediate consistency, sharding, failover, among other features.
On October 5, 2016, the company announced it was shutting down and would no longer offer production support because they could not build a sustainable business. On February 6, 2017, Cloud Native Computing Foundation purchased the rights to the source code and licensed it under the Apache License 2.0
Currently, RethinkDB is the second most popular database on GitHub, and it has lots of interest and support from the developer community.
RethinkDB Core Components
Indexes
RethinkDB uses the primary key attribute in a table (defaulting to using the id if the primary key is not specified) in order to index any record added to the table. If the table does not have a primary key, a random unique ID is generated for indexing automatically. The primary key is used by RethinkDB to place the document in the correct/appropriate shard and index within that shard using a B-tree data structure. Fetching data using the primary key is efficient because the query can be directed to the right shard and then the document can be looked up in the B-tree.
Note: Sharding is the process of breaking up large tables into smaller chunks.
RethinkDB Client Drivers
The RethinkDB client drivers are responsible for:
Opening a connection.
Performing a handshake.
Serializing the queries.
Sending the message to the server using the ReQL protocol.
Receiving response and returning to the calling application.
RethinkDB has several client drivers for different languages, some of which are supported internally by the RethinkDB team, and the others through community support.
RethinkDB Query Language
The RethinkDB query language offers a way of manipulating JSON documents. It was built on three principles
It embeds into your programming language.
It is chainable.
It executes on the server.
Example of a chainable query:
from rethinkdb import RethinkDB
r = RethinkDB()
conn = r.connect()
r.table('employees').pluck('email').distinct().count().run(conn)
Concurrency Control
RethinkDB makes use of block-level multi-version concurrency control (MVCC). When a write operation occurs while a read operation is being worked on, RethinkDB takes a snapshot of the B-Tree for each relevant shard and temporarily maintains different versions of the blocks in order to execute read and write operations concurrently.
RethinkDB query execution flow
Resources
Recommended Articles
AWS SQS
AWS SQS is a distributed message queuing service for asynchronous messaging.
Amazon Kinesis Data Streams
Kinesis Data Streams is a scalable and durable realtime data streaming service.
IronMQ
IronMQ is a message queuing service for distributed cloud applications.