Storing binary data in MongoDB with GridFS

MongoDB provides two ways for storing binary data. The first one is the Binary data type which can store objects up to 16KB in size in a regular document. However, for larger files MongoDB offers GridFS, a filesystem like collection that can store very large files and can also be sharded and distributed across the MongoDB cluster. You can find more in-depth information here.

GridFS splits each binary file into 255k chunks and stores them in two collections: fs.files stores the metadata and fs.chunks stores the binary data itself along with some chunk related metadata. Those collections are automatically indexed. Accessing and handling GridFS is not integrated into Mongo client but done via a separate utility program called mongofiles that exists in the same directory where MongoDB binaries are.

For our demonstration we will use the picture of this cute puppy here:

The picture isn’t large enough to fill more than one chunk, but it will do for a demonstration. First of all, we will use mongofiles to store the file in GridFS:

[[email protected] bin]# mongofiles -d images put /root/Downloads/puppy.jpg
2017-04-30T13:59:49.086+0300 connected to: localhost
added file: /root/Downloads/puppy.jpg

This is the simplest command that connect to local mongod on the default port 2017. To connect to a remote host or a non-default port you should add the –host and –port options. The -d command sets the database we want to connect to. You can see all available options in the documentation.

We can run mongofiles with the “list” option to list all the files in GridFS:

[[email protected] Downloads]# mongofiles -d images list
2017-05-01T09:29:44.359+0300 connected to: localhost
puppy.jpg 20357

As I mentioned earlier, the file itself and the metadata is kept in fs.files and fs.chunks in the images database we created. Lets tale a look at them:

> use images
switched to db images
> show collections
> db.fs.files.find({}).pretty(
... )
 "_id" : ObjectId("5906d5bce138230b0fc5db8f"),
 "chunkSize" : 261120,
 "uploadDate" : ISODate("2017-05-01T06:29:16.624Z"),
 "length" : 20357,
 "md5" : "952c809825a3342904458b7ef7a994d5",
 "filename" : "puppy.jpg"

You can see that the files collection contains the file id, name, size, upload date and checksum. Now let’s look at the chunks collection:

"_id" : ObjectId("5906d5bce138230b0fc5db90"),
"files_id" : ObjectId("5906d5bce138230b0fc5db8f"),
"n" : 0,
"data" : BinData(0,"/9j/2wBDAAwICAgJCAwJCQwRCwoLERUPDAwPFRgTExUTExgzJCQkJCQ..

This collection holds not only the chunk metadata (our file is small so it spans only one chunk), but the binary data itself.
The binary “data” column is very long, so I only show its beginning and I removed most of it.

Now let’s retrieve the file, first using mongofiles and then using an external program.

[[email protected] Downloads]# ls -l
total 20
-rw-r--r-- 1 root root 20357 May 1 10:22 puppy.jpg
[[email protected] Downloads]# rm -f puppy.jpg
[[email protected] Downloads]# ls
[[email protected] Downloads]# mongofiles -d images get puppy.jpg
2017-05-01T10:23:34.013+0300 connected to: localhost
finished writing to puppy.jpg
[[email protected] Downloads]# ls -l
total 20
-rw-r--r-- 1 root root 20357 May 1 10:23 puppy.jpg


I deleted the file from local filesystem, then got it back from GridFS. The file is back with the same size.

Now let’s try to get the file from a client program. I used Java for this demo but many other languages are supported. I wanted to keep the program as simple and basic as possible, so it is a “quick and dirty” program that doesn’t follow best practices.

I used maven for the dependencies with this setup:



The Java driver documentation can be found here.

package mongodb.gridfs;

import com.mongodb.DB;
import com.mongodb.MongoClient;
import com.mongodb.gridfs.GridFS;
import com.mongodb.gridfs.GridFSDBFile;
import java.util.logging.Level;
import java.util.logging.Logger;

 * @author guys
public class ImageReader {
    public static void main(String[] args)
            // first we need to create a mongo client and point it to the right host.
            // Omitting the host name defaults to localhost.
            MongoClient mongoClient = new MongoClient( "" );
            System.out.println("Successfuly connected to MngoDB server.");
            // Now connect to a certain database on the server represented by a DB object.
            DB db = mongoClient.getDB("images");
            System.out.println("Successfuly connected to MngoDB database.");
            // From the DB, get a reference to the GridFS.
            GridFS gridfs=new GridFS(db);
            // Now use findOne method to find our file.
            // If you want multiple files, use find() instead and get a list of
            // all the files which satisfy the find criteria.
            GridFSDBFile file=gridfs.findOne("puppy.jpg");
            System.out.println("Found the file.");
            // Now we create an output stream to a file on local filesystem and redirect
            // The GridFS contents into this stream.
            try (FileOutputStream fos = new FileOutputStream("c:\\users\\guys\\puppy.jpg")) {                
            } catch (IOException ex) {
                Logger.getLogger(ImageReader.class.getName()).log(Level.SEVERE, null, ex);

After running the program you can see that the puppy’s picture was pooled from GridFS (running on Linux) and copied to my local Windows computer:


GridFS isn’t  a real filesystem like HDFS and it does not support hierarchical directories.

It splits large files to 255MB chunks and internally it holds the file parts in a regular Binary data type, then it wraps it in a filesystem-like API.

It is convenient to work with, but when the number of files increases it will become hard to organize them. There are some tricks to extend GridFS to support virtual directories but it involves manually writing code to handle it. Right now it is the only available option for storing larger files like video or audio inside the database.

This entry was posted in MongoDB and tagged , , . Bookmark the permalink.

Leave a Reply