Document database shootout: MongoDB vs. CouchBase

MongoDB and CouchBase are both document stores that together dominate the market of document databases.  MongoDB is a clear leader in terms of market share and CouchBase is the Challenger. I wanted to check which performs better and if CouchBase is a real alternative to MongoDB. The answer to “which is a better database” is far beyond the scope of this post, since they differ in architecture and in many features. I also couldn’t test them in a variety of use cases and workloads. So I picked a certain data and certain set of queries and compared how the two databases  handles them. This is quite narrow comparison but I hope it can give a clue about the general performance difference between the two.


I created a Three nodes CouchBase cluster and a Three nodes MongoDB cluster (no replica sets, just Three data nodes, a configuration server and a mongos server) on a similar hardware. A minimal setup that will do for our test. You can look at this old post for details how to do it.

Firs of all, we will generate the test data. CouchBase comes with a handy tool for generating sample workload data called cbworkloadgen. Run it with the -h flag to get the full options list. I ran it this way to generate 10 million documents in a bucket called “sampledata” that I created beforehand:

export PATH=/opt/couchbase/bin:$PATH

cbworkloadgen -n couch1:8091 -j -i 10000000 -t 3 -b sampledata

It produces random data that looks like this:

"name": "pymc100",
"age": 100,
"index": 100,
"body": "VTKGNKUHMP"

When this was done, I created a primary and a secondary indexes on the sampledata bucket:

create primary index on sampledata using view;
create index ix1 on sampledata (doc.age) using view;

CouchBase has a unique mechanism of “views” which are basically stored map and reduce functions that are used for running calculations on the data. The views are persisted to disk and re-calculated periodically to reflect the real data in the bucket. I wanted to add a view based query to the test so I created my own custom view for average calculation, called “avg”. Here are its mapper and reducer:

View full size image

Now I used cbtransfer utility to dump the data to text files:

cbtransfer http://couch1:8091 csv:./sampledata.csv -b sampledata -u Administrator -p manager

This command results in a file for every shard. So first we have to concatenate those files to form one large file. Unfortunately, although specifying csv file output, the output of cbtransfer is not a flat csv file but contains the data in JSON format along with some extra, CouchBase specific data such as id and cas, which are not necessary for our matter:

pymc3648,0,0,1497171080364752896,"{""name"": ""pymc3648"", ""age"": 12, ""index"": 3648, ""body"": ""VTKGNKUHMP""}",3,810,1
pymc5578,0,0,1497171080929804288,"{""name"": ""pymc5578"", ""age"": 23, ""index"": 5578, ""body"": ""VTKGNKUHMP""}",3,810,1
pymc6759,0,0,1497171081496559616,"{""name"": ""pymc6759"", ""age"": 93, ""index"": 6759, ""body"": ""VTKGNKUHMP""}",3,810,1
pymc11222,0,0,1497171082632822784,"{""name"": ""pymc11222"", ""age"": 11, ""index"": 11222, ""body"": ""VTKGNKUHMP""}",3,810,1
pymc11550,0,0,1497171083018174464,"{""name"": ""pymc11550"", ""age"": 36, ""index"": 11550, ""body"": ""VTKGNKUHMP""}",3,810,1
pymc12003,0,0,1497171083018829824,"{""name"": ""pymc12003"", ""age"": 85, ""index"": 12003, ""body"": ""VTKGNKUHMP""}",3,810,1
pymc12193,0,0,1497171083018895360,"{""name"": ""pymc12193"", ""age"": 73, ""index"": 12193, ""body"": ""VTKGNKUHMP""}",3,810,1
pymc12771,0,0,1497171083367022592,"{""name"": ""pymc12771"", ""age"": 45, ""index"": 12771, ""body"": ""VTKGNKUHMP""}",3,810,1
pymc13897,0,0,1497171083762532352,"{""name"": ""pymc13897"", ""age"": 60, ""index"": 13897, ""body"": ""VTKGNKUHMP""}",3,810,1

So I had to preprocess it a little to prepare it for importing into MongoDB. This shell command did the job of removing extra columns and extra quotes:

cat temp.json | cut -d',' -f5-8 | tr -s '"' | sed -e 's/^"//' -e 's/"$//' > sampledata.json

Now it looks more like a tidy JSON that we can load into MongoDB:

{"name": "pymc9987665", "age": 78, "index": 9987665, "body": "VTKGNKUHMP"}
{"name": "pymc9988946", "age": 46, "index": 9988946, "body": "VTKGNKUHMP"}
{"name": "pymc9989042", "age": 41, "index": 9989042, "body": "VTKGNKUHMP"}
{"name": "pymc9989730", "age": 22, "index": 9989730, "body": "VTKGNKUHMP"}
{"name": "pymc9998739", "age": 42, "index": 9998739, "body": "VTKGNKUHMP"}

Let’s load it to MongoDB so that CouchBase and MongoDB will have the same data.

mongoimport --db guy --type json --file ./sampledata.json --stopOnError

After loading the data to MongoDB and sharding the new collection, I created an index on age column, similar to the one I created in CouchBase:

mongos> use guy
switched to db guy
mongos> db.sampledata.createIndex({age:1})
        "raw" : {
                "guy1/mongo1.lan:27017" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 1,
                        "numIndexesAfter" : 2,
                        "ok" : 1,
                        "$gleStats" : {
                                "lastOpTime" : {
                                        "ts" : Timestamp(1497357688, 1),
                                        "t" : NumberLong(3)
                                "electionId" : ObjectId("7fffffff0000000000000003")
        "ok" : 1

Now we have our two competitors ready and we can begin our test.



Now we have a Three node Couchbase cluster and a Three node MongoDB cluster with identical 10 Million documents indexed bucket/collection called “sampledata”. All the servers have an identical hardware.

The test I did is very simple and it does not cover all kinds of possible workloads. I ran Three queries on the sampledata bucket/collection: one retrieves a single document by age, the second scans part of the index (equivalent to “where age between x and y”), and the Third was an aggregation query that calculates the average age from the whole data set. I ran each query Five times and the average timing was considered, to eliminate one-time spikes.

I queried CouchBase in two different ways, using its query language, n1ql and using the “avg” view we created earlier in this post (unfortunately n1ql cannot use custom views). MongoDB also has a Map/Reduce functionality so to keep things fair I also added it to the comparison along with regular “find” queries. Map/Reduce in both databases was used only for the aggregation test.

Here is the MongoDB MapqReduce query I used (I tried to keep it as similar as possible to the CouchBase one):

 db.sampledata.mapReduce (function() {emit (1,this.age);},
	function (keys,values) {
		var total=0, count=0;
		for (v in values) {

		return total/count;
	 {out: {inline:1}})

The chart below shows how long each query took in seconds. The charts scale is somewhat misleading, because in Couchbase n1ql the aggregation query never returned. I waited long 9 minutes before aborting it. And it happened every time I ran it. In the chart data I set it to 120 seconds just to make the chart more readable, but it actually took infinity to end. On the other hand, each query that ran for less than a second was recorded as 0.5 second

To make it clearer, here is the real table the chart is based on (each value is an average of 5 runs, I considered time that is greater than 5 minutes as infinity and time that is less than a second as 0.5 of a second):

MongoDB - CouchBase comparison

 Single documentRangeAggregation
MongoDB (find)0.50.517.32
MongoDB (MapReduce)110
CouchBase (n1ql)5.384.3Infinity
CouchBase (view)0.5

I was surprised with the poor performance of n1ql queries. Looking at the explain plan of the queries in CouchBase, I can see it uses the primary index and not the secondary index we created on “age” column. Running on the secondary index may be faster and I don’t know why the optimizer decided to use the other index. As you can see the view based aggregation query was very fast.

Sharp-eyed readers may have noticed that I created the CouchBase indexes using view and not gsi. Theoretically, gsi indexes should be more eficient for ad hoc queries. However, I also tested all the queries with gsi indexes and they did not perform any better than view based indexes (they were actually much slower).


Despite its caching layer, In ad hoc queries CouchBase is no match for MongoDB. MongoDB performed significantly better in every scenario and the difference is even more evident in aggregation queries where CouchBase just couldn’t deliver. But CouchBase still have one ace up it’s sleeve, its unique views mechanism. Querying data using views, even if it is an aggregate on a large dataset, is faster than anything MongoDB has to offer. However, views are not suitable for every use case. You should know the query in advance, create a tailored custom view just for that query and views are only eventually consistent so you may get stale data.

So as a general purpose database, MongoDB is the clear winner here. But if you have relatively small number of fixed aggregation queries that run over and over, CouchBase views may be your best fit.



This entry was posted in couchbase, MongoDB and tagged , . Bookmark the permalink.

2 Responses to Document database shootout: MongoDB vs. CouchBase

  1. avi says:

    The article is misleading and not precise. Note that views in Couchbase are updated asynchronously, meaning the data is not up to date immediately within the view structure. Meaning, you CANNOT compare views to MongoDB index in any way since the difference is in data consistency + query performance. In addition, what is your MongoDB deployment of choise? In addition, in Couchbase, did you set up replication? In MongoDB seems like you did not. What write concern in MongoDB did you use? Was it journaled or in mermory? Default is Journaled (bound by IO performance). In Couchbase, did you have PERSIST_TO or REPLICATE_TO set up? If not, then the default consistency level is MEMORY, meaning again, the data is not consistent in the same level, and you must compare apples to apples!

    • Guy Shilo says:

      Unfortunately you cannot compare “apples to apples” because MongoDB and CouchBase are not the same.
      Each have unique features that are not present in the other.
      In the post I mentioned that “views are only eventually consistent so you may get stale data”, and I also “Sharp-eyed readers may have noticed that I created the CouchBase indexes using view and not gsi. Theoretically, gsi indexes should be more eficient for ad hoc queries. However, I also tested all the queries with gsi indexes and they did not perform any better than view based indexes”.
      Anyway, this doesn’t seems relevant since the data was at rest when I performed the tests and it was long ago persisted to disk.
      I did not use replication in any of the two databases and I also did not change any default parameters.

      You really cannot compare CouchBase views to MongoDB views or MongoDB MapReduce.
      I did not try to compare features, which is almost impossible, I wanted each database to make its best effort to run the queries using the features it has and see who is faster.
      I still think that adding replica sets to MongoDB or changing the persistance of CouchBase will not significantly change the outcome.

Leave a Reply