Few weeks ago, a message on LinkedIn caught my eye. It was introducing a new database with a funny (some will say disgusting) name I have never heard of before: CockroachDB.
The message was promising: a distributed, scalable database that uses standard SQL (Yay !). So I had to take a closer look at it. I did not have the chance to test it thoroughly yet, so some of the information I write here is from my own experience and some is from the documentation only.
The company behind this unique database is Cockroach Labs and it is an open source database. They also offer their database as a service in the cloud.
We were thought several things about databases in recent years:
- Relational databases do not scale out well.
- Distributed databases must be “NoSQL”
- Distributed databases cannot support strong consistency (the CAP theorem).
In many ways, CockroachDB smashes those theories.
CockroachDB is a relational database that stores data in tables. It has all the functionality of a regular rdbms such as indexes, joins, constraints and distributed transactions. It also uses the PostgreSQL communication protocol, so any client software or driver that is compatible with PostgreSQL will also work with CockroachDB. And at the same time it is a distributed database with nodes that form a cluster, which is fault tolerant and scales easily.
The infrastructure of CockroacDB is a key-value store inspired by google Spanner, but on top of it there is a SQL API that hides the underlying infrastructure. The database automatically splits data to chunks called ranges (I guess it’s an equivalent to shards) and then distributes them across the nodes. There is an automatic rebalancing and replication so manual maintenance is minimal. There is also a concept of replication zones that enables a different replication factor for different tables or databases.
Let’s try it. I created a Three node cluster of Linux machines. On each machine you should install the go language along with some prerequisite packages:
rpm --import https://mirror.go-repo.io/centos/RPM-GPG-KEY-GO-REPO curl -s https://mirror.go-repo.io/centos/go-repo.repo | tee /etc/yum.repos.d/go-repo.repo yum install gcc gcc-c++ libstdc++ golang cmake xz
Now download the distribution package from here. There is no Yum/apt repository nor rpm/deb installer so you will have to just run it manually. The archive is only 14.3 MB in size. Extract it and copy the executable to a directory in your PATH:
tar xfz cockroach-latest.linux-amd64.tgz cp -i cockroach-latest.linux-amd64/cockroach /usr/local/bin
Now we will start the first node. Since we have no rpm package, there is no service defined, unless you create your own scripts. The documentation does not mention support for parameter file, only in line parameters. Not a very convenient setup:
cockroach start --insecure \ --host=cockroach2.lan \ --background
If everything works as expected you get this message:
CockroachDB node starting at 2017-06-01 14:14:21.498450933 +0300 IDT build: CCL v1.0.1 @ 2017/05/25 15:17:49 (go1.8.1) admin: http://cockroach2.lan:8080 sql: postgresql://[email protected]:26257?sslmode=disable logs: /root/cockroach-data/logs store: path=/root/cockroach-data status: restarted pre-existing node clusterID: 3ae861ed-aed7-4de7-814f-a3beb2d71336 nodeID: 1
Now run this on the other nodes so they join the cluster. Make sure to change the host names:
cockroach start --insecure \ --host=cockroach1.lan \ --background \ --join=cockroach2:26257
In order to avoid the hassle of creating users and passwords we started the cluster in insecure mode, where there is no user control. There is an option to add credentials and even ssl encryption. As you can see in the startup message, each node exposes an admin web interface at http://[host name]:8080, let’s see how it looks like:
The page above shows a list of the nodes in the cluster and their status.
The page above shows the replication status, and there are many more useful pages. However, the web interface is more on the information side than on the action side. You can see lots of valuable information, but if you want to take actions then you have to go back to command line or try a 3rd party tool with PostgreSQL drivers.
CockroachDB also comes with a sql command line client, here is how to invoke it:
cockroach sql --insecure --host=cockroach3.lan --port=26257 # Welcome to the cockroach SQL interface. # All statements must be terminated by a semicolon. # To exit: CTRL + D.
Inside, you can run standard sql commands just as you would on MySQL or PostgreSQL databases:
[email protected]:26257/> create database guy; CREATE DATABASE [email protected]:26257/> show databases; +--------------------+ | Database | +--------------------+ | crdb_internal | | guy | | information_schema | | pg_catalog | | system | +--------------------+ (5 rows) [email protected]:26257/guy> set database=guy; SET
The commands above created a new database named “guy” and set it as the active database. Now let’s create a table, insert some data and query it:
CREATE TABLE [email protected]:26257/guy> show tables; +--------+ | Table | +--------+ | table1 | +--------+ (1 row) [email protected]:26257/guy> INSERT INTO table1 VALUES (1, 'Guy'); INSERT 1 [email protected]:26257/guy> INSERT INTO table1 VALUES (2,'Danny'); INSERT 1 [email protected]:26257/guy> INSERT INTO table1 VALUES (3,'Joe'); INSERT 1 [email protected]:26257/guy> select * from table1; +----+-------+ | id | name | +----+-------+ | 1 | Guy | | 2 | Danny | | 3 | Joe | +----+-------+ (3 rows)
The database looks and reacts just like a regular relational database. The SQL dialect is close to PostgreSQL and MySQL but is not the same (for example, it uses string instead of varchar) . you can look at the SQL reference here.
And here is the table we created in the admin console:
This database fills a gap between old school relational databases and NoSQL’s. Classic RDBMS’s do not scale easily and NoSQL’s usually trade consistency for performance. This database can scale easily and it seems really easy to manage (sharding, replication and rebalancing are automatic) and yet retains a RDBMS-like transactions and strong consistency. Keeping a distributed database consistent and at the same time responsive and well performing is a big challenge. I did not have the chance to test its performance nor how it handles heavy load, so I cannot say if it really meets that challenge.
This is the first production release of Cockroach DB so it has some rough edges:
- The documentation is mostly introductory. There is no in-depth guide or reference.
- There is no installer that sets Cockroach DB as a Linux service. You have to run it manually or create your own scripts.
- It cannot read its parameters from a parameter file, so you need to pipe the parameters directly to the executable if you want to change something, like this:
echo 'num_replicas: 5' | cockroach zone set .default --insecure -f
- Some of the SQL language features are not supported yet, look here for the full list.
But it also has some brilliant ideas like using the key-value store under the hood , API compatibility with PostgreSQL and of course supporting standard SQL (some will argue if supporting SQL is an advantage at all, but I think it is).
This is a very interesting effort that can become a very good database, as long as it has competitive performance. It’s only the first release and I believe that most of the problems will be addressed in future releases. I will keep following Cockroach DB and try to cover it in greater detail in future posts.