Hadoop benchmarking utilities – Part 2

In part 1 we introduced Teragen, Terasort and Teravalidate. In this post we will show another trio: NNBench, MRBench and TestDFSIO.

All the below examples were run on Cloudera CDH 5.12. On other distributions the file location may vary, so you just have to run “find . -name” command to find it.

All commands should be ran as user hdfs.

NNBench

This test is designed to stress test the NameNode and see if it’s configuration is good enough for production. It generates high rate of requests for creating, reading, renaming or deleting files in HDFS to simulate a busy cluster’s workload.

The command is:

hadoop jar /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient.jar nnbench -operation create_write -baseDir /NNBench -numberOfFiles 1000 -readFileAfterOpen true

Parameter 1: operation. The program can be ran it four modes- create_write, open_read, rename and delete.

Parameter 2: basedir, a directory where the files will be created.

Parameter 3: numberOfFiles, How many files to create. For production you should choose a number higher than 1000, or leave it blank.

Parameter 4: ReadFileAfterOpen –

Here I ran only create_write operation but you should also run it with open_read, rename and delete to get a full view of your performance. The number of files on a real world cluster should also be much higher than the 1000 I ran on. There are also other parameters and options, You can see the full options list in the documentation.

This is the command output:

NameNode Benchmark 0.4
18/01/27 22:41:05 INFO hdfs.NNBench: Test Inputs:
18/01/27 22:41:05 INFO hdfs.NNBench:            Test Operation: create_write
18/01/27 22:41:05 INFO hdfs.NNBench:                Start time: 2018-01-27 22:43:05,454
18/01/27 22:41:05 INFO hdfs.NNBench:            Number of maps: 1
18/01/27 22:41:05 INFO hdfs.NNBench:         Number of reduces: 1
18/01/27 22:41:05 INFO hdfs.NNBench:                Block Size: 1
18/01/27 22:41:05 INFO hdfs.NNBench:            Bytes to write: 0
18/01/27 22:41:05 INFO hdfs.NNBench:        Bytes per checksum: 1
18/01/27 22:41:05 INFO hdfs.NNBench:           Number of files: 1000
18/01/27 22:41:05 INFO hdfs.NNBench:        Replication factor: 1
18/01/27 22:41:05 INFO hdfs.NNBench:                  Base dir: /NNBench
18/01/27 22:41:05 INFO hdfs.NNBench:      Read file after open: true
18/01/27 22:41:18 INFO hdfs.NNBench: Deleting data directory
18/01/27 22:41:18 INFO hdfs.NNBench: Creating 1 control files
18/01/27 22:41:20 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
18/01/27 22:41:20 INFO client.RMProxy: Connecting to ResourceManager at cloudera1.lan/192.168.1.103:8032
18/01/27 22:41:22 INFO client.RMProxy: Connecting to ResourceManager at cloudera1.lan/192.168.1.103:8032
18/01/27 22:41:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/01/27 22:41:24 INFO mapred.FileInputFormat: Total input paths to process : 1
18/01/27 22:41:25 INFO mapreduce.JobSubmitter: number of splits:1
18/01/27 22:41:25 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
18/01/27 22:41:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517083579599_0003
18/01/27 22:41:28 INFO impl.YarnClientImpl: Submitted application application_1517083579599_0003
18/01/27 22:41:28 INFO mapreduce.Job: The url to track the job: http://cloudera1.lan:8088/proxy/application_1517083579599_0003/
18/01/27 22:41:28 INFO mapreduce.Job: Running job: job_1517083579599_0003
18/01/27 22:41:40 INFO mapreduce.Job: Job job_1517083579599_0003 running in uber mode : false
18/01/27 22:41:40 INFO mapreduce.Job:  map 0% reduce 0%
18/01/27 22:42:01 INFO mapreduce.Job:  map 67% reduce 0%
18/01/27 22:43:10 INFO mapreduce.Job:  map 100% reduce 0%
18/01/27 22:43:22 INFO mapreduce.Job:  map 100% reduce 100%
18/01/27 22:43:22 INFO mapreduce.Job: Job job_1517083579599_0003 completed successfully
18/01/27 22:43:22 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=179
                FILE: Number of bytes written=260589
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=239
                HDFS: Number of bytes written=168
                HDFS: Number of read operations=7
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=1002
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=86959
                Total time spent by all reduces in occupied slots (ms)=8864
                Total time spent by all map tasks (ms)=86959
                Total time spent by all reduce tasks (ms)=8864
                Total vcore-milliseconds taken by all map tasks=86959
                Total vcore-milliseconds taken by all reduce tasks=8864
                Total megabyte-milliseconds taken by all map tasks=89046016
                Total megabyte-milliseconds taken by all reduce tasks=9076736
        Map-Reduce Framework
                Map input records=1
                Map output records=7
                Map output bytes=168
                Map output materialized bytes=175
                Input split bytes=115
                Combine input records=0
                Combine output records=0
                Reduce input groups=7
                Reduce shuffle bytes=175
                Reduce input records=7
                Reduce output records=7
                Spilled Records=14
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=201
                CPU time spent (ms)=4060
                Physical memory (bytes) snapshot=553672704
                Virtual memory (bytes) snapshot=5467148288
                Total committed heap usage (bytes)=524029952
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=124
        File Output Format Counters
                Bytes Written=168
18/01/27 22:43:22 INFO hdfs.NNBench: -------------- NNBench -------------- :
18/01/27 22:43:22 INFO hdfs.NNBench:                                Version: NameNode Benchmark 0.4
18/01/27 22:43:22 INFO hdfs.NNBench:                            Date & time: 2018-01-27 22:43:22,980
18/01/27 22:43:22 INFO hdfs.NNBench:
18/01/27 22:43:22 INFO hdfs.NNBench:                         Test Operation: create_write
18/01/27 22:43:22 INFO hdfs.NNBench:                             Start time: 2018-01-27 22:43:05,454
18/01/27 22:43:22 INFO hdfs.NNBench:                            Maps to run: 1
18/01/27 22:43:22 INFO hdfs.NNBench:                         Reduces to run: 1
18/01/27 22:43:22 INFO hdfs.NNBench:                     Block Size (bytes): 1
18/01/27 22:43:22 INFO hdfs.NNBench:                         Bytes to write: 0
18/01/27 22:43:22 INFO hdfs.NNBench:                     Bytes per checksum: 1
18/01/27 22:43:22 INFO hdfs.NNBench:                        Number of files: 1000
18/01/27 22:43:22 INFO hdfs.NNBench:                     Replication factor: 1
18/01/27 22:43:22 INFO hdfs.NNBench:             Successful file operations: 0
18/01/27 22:43:22 INFO hdfs.NNBench:
18/01/27 22:43:22 INFO hdfs.NNBench:         # maps that missed the barrier: 0
18/01/27 22:43:22 INFO hdfs.NNBench:                           # exceptions: 1000
18/01/27 22:43:22 INFO hdfs.NNBench:
18/01/27 22:43:22 INFO hdfs.NNBench:                TPS: Create/Write/Close: 0
18/01/27 22:43:22 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity
18/01/27 22:43:23 INFO hdfs.NNBench:             Avg Lat (ms): Create/Write: NaN
18/01/27 22:43:23 INFO hdfs.NNBench:                    Avg Lat (ms): Close: NaN
18/01/27 22:43:23 INFO hdfs.NNBench:
18/01/27 22:43:23 INFO hdfs.NNBench:                  RAW DATA: AL Total #1: 0
18/01/27 22:43:23 INFO hdfs.NNBench:                  RAW DATA: AL Total #2: 0
18/01/27 22:43:23 INFO hdfs.NNBench:               RAW DATA: TPS Total (ms): 2769
18/01/27 22:43:23 INFO hdfs.NNBench:        RAW DATA: Longest Map Time (ms): 2769.0
18/01/27 22:43:23 INFO hdfs.NNBench:                    RAW DATA: Late maps: 0
18/01/27 22:43:23 INFO hdfs.NNBench:              RAW DATA: # of exceptions: 1000
18/01/27 22:43:23 INFO hdfs.NNBench:

At the beginning you can see the parameters passed to the program, and in the end report you can see some valuable information such as start time, end time and stats like average exec time and average write latency (in this run they are not available for some reason).

MRBench

This test runs small MapReduce job many times by producing an input file and then sorting its contents. It can be used to test Yarn and its components like schedulers, resource pools, resource managers and so on. You can see the system’s throughput and how it handles load.

The fill documentation can be found here.

Here is the command I ran:

hadoop jar /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient.jar mrbench -baseDir /MRBench

I ran it with default parameters, supplying only basedir. But you can control the test behavior by supplying parameters like:

NumRuns: How many jobs to run.

Maps: How many mappers to use for each job, default is 2.

Reduces: How many reducers to use for each job, default is 1.

There are some other options available, see the documentation for a full list.

Here is the output:

MRBenchmark.0.0.2
18/01/27 22:50:22 INFO mapred.MRBench: creating control file: 1 numLines, ASCENDING sortOrder
18/01/27 22:50:23 INFO mapred.MRBench: created control file: /MRBench/mr_input/input_1795319292.txt
18/01/27 22:50:23 INFO mapred.MRBench: Running job 0: input=hdfs://cloudera1.lan:8020/MRBench/mr_input output=hdfs://cloudera1.lan:8020/MRBench/mr_output/output_-1645006832
18/01/27 22:50:24 INFO client.RMProxy: Connecting to ResourceManager at cloudera1.lan/192.168.1.103:8032
18/01/27 22:50:25 INFO client.RMProxy: Connecting to ResourceManager at cloudera1.lan/192.168.1.103:8032
18/01/27 22:50:26 INFO mapred.FileInputFormat: Total input paths to process : 1
18/01/27 22:50:27 INFO mapreduce.JobSubmitter: number of splits:2
18/01/27 22:50:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517083579599_0004
18/01/27 22:50:29 INFO impl.YarnClientImpl: Submitted application application_1517083579599_0004
18/01/27 22:50:29 INFO mapreduce.Job: The url to track the job: http://cloudera1.lan:8088/proxy/application_1517083579599_0004/
18/01/27 22:50:29 INFO mapreduce.Job: Running job: job_1517083579599_0004
18/01/27 22:50:41 INFO mapreduce.Job: Job job_1517083579599_0004 running in uber mode : false
18/01/27 22:50:41 INFO mapreduce.Job:  map 0% reduce 0%
18/01/27 22:50:51 INFO mapreduce.Job:  map 50% reduce 0%
18/01/27 22:50:52 INFO mapreduce.Job:  map 100% reduce 0%
18/01/27 22:51:02 INFO mapreduce.Job:  map 100% reduce 100%
18/01/27 22:51:02 INFO mapreduce.Job: Job job_1517083579599_0004 completed successfully
18/01/27 22:51:03 INFO mapreduce.Job: Counters: 50
        File System Counters
                FILE: Number of bytes read=27
                FILE: Number of bytes written=386330
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=233
                HDFS: Number of bytes written=3
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=1
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=16848
                Total time spent by all reduces in occupied slots (ms)=6970
                Total time spent by all map tasks (ms)=16848
                Total time spent by all reduce tasks (ms)=6970
                Total vcore-milliseconds taken by all map tasks=16848
                Total vcore-milliseconds taken by all reduce tasks=6970
                Total megabyte-milliseconds taken by all map tasks=17252352
                Total megabyte-milliseconds taken by all reduce tasks=7137280
        Map-Reduce Framework
                Map input records=1
                Map output records=1
                Map output bytes=5
                Map output materialized bytes=39
                Input split bytes=230
                Combine input records=0
                Combine output records=0
                Reduce input groups=1
                Reduce shuffle bytes=39
                Reduce input records=1
                Reduce output records=1
                Spilled Records=2
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=342
                CPU time spent (ms)=2680
                Physical memory (bytes) snapshot=964153344
                Virtual memory (bytes) snapshot=8194236416
                Total committed heap usage (bytes)=920264704
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=3
        File Output Format Counters
                Bytes Written=3
DataLines       Maps    Reduces AvgTime (milliseconds)
1               2       1       39589

At the end you can see a report with the number of errors that occurred, and the last line shows the parameters that the test used along with the average job time.

TestDFSIO

This tests the IO throughput of HDFS, again, by running MapReduce jobs. It loads HDFS and calculates some stats regarding the number of MegaBytes the system can process per second. For more details see the documentation.

Here is the command:

hadoop jar /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient.jar TestDFSIO -write -nrFiles 8 -fileSize 1GB -resFile /tmp/TestDFSIOwrite.txt

Options: Read, Write, Append, Clean. Normally you will first run write to create the files, then run read to read them and finally clean to clean up the files created. Append appends data to existing files.
nrFiles: the number of files (equal to the number of map tasks)
fileSize: The size of each file.

The above command only demonstrates the write part, but you should then run also read and clean commands. It also does not represent a real world cluster test. It uses only Eight files, each of 1Gb only. This is because my cluster has only Four nodes and a very small storage. Running real world parameters on this cluster can take forever to finish.

On a bigger cluster you should increase those parameters, something like 64 files of 16GB each.

Here is the output:

18/01/31 23:11:59 INFO fs.TestDFSIO: TestDFSIO.1.7
18/01/31 23:11:59 INFO fs.TestDFSIO: nrFiles = 8
18/01/31 23:11:59 INFO fs.TestDFSIO: nrBytes (MB) = 1024.0
18/01/31 23:11:59 INFO fs.TestDFSIO: bufferSize = 1000000
18/01/31 23:11:59 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
18/01/31 23:12:23 INFO fs.TestDFSIO: creating control file: 1073741824 bytes, 8 files
18/01/31 23:12:27 INFO fs.TestDFSIO: created control files for: 8 files
18/01/31 23:12:28 INFO client.RMProxy: Connecting to ResourceManager at cloudera1.lan/192.168.1.103:8032
18/01/31 23:12:29 INFO client.RMProxy: Connecting to ResourceManager at cloudera1.lan/192.168.1.103:8032
18/01/31 23:12:33 INFO mapred.FileInputFormat: Total input paths to process : 8
18/01/31 23:12:33 INFO mapreduce.JobSubmitter: number of splits:8
18/01/31 23:12:33 INFO Configuration.deprecation: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
18/01/31 23:12:33 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
18/01/31 23:12:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517432822605_0001
18/01/31 23:12:39 INFO impl.YarnClientImpl: Submitted application application_1517432822605_0001
18/01/31 23:12:41 INFO mapreduce.Job: The url to track the job: http://cloudera1.lan:8088/proxy/application_1517432822605_0001/
18/01/31 23:12:41 INFO mapreduce.Job: Running job: job_1517432822605_0001
18/01/31 23:12:57 INFO mapreduce.Job: Job job_1517432822605_0001 running in uber mode : false
18/01/31 23:12:57 INFO mapreduce.Job:  map 0% reduce 0%
18/01/31 23:13:22 INFO mapreduce.Job:  map 17% reduce 0%
18/01/31 23:13:37 INFO mapreduce.Job:  map 21% reduce 0%
18/01/31 23:13:42 INFO mapreduce.Job:  map 25% reduce 0%
18/01/31 23:13:57 INFO mapreduce.Job:  map 33% reduce 0%
18/01/31 23:14:02 INFO mapreduce.Job:  map 42% reduce 0%
18/01/31 23:14:17 INFO mapreduce.Job:  map 46% reduce 0%
18/01/31 23:14:22 INFO mapreduce.Job:  map 50% reduce 0%
18/01/31 23:14:40 INFO mapreduce.Job:  map 58% reduce 0%
18/01/31 23:14:46 INFO mapreduce.Job:  map 67% reduce 0%
18/01/31 23:14:53 INFO mapreduce.Job:  map 71% reduce 0%
18/01/31 23:15:05 INFO mapreduce.Job:  map 75% reduce 0%
18/01/31 23:15:17 INFO mapreduce.Job:  map 83% reduce 0%
18/01/31 23:15:27 INFO mapreduce.Job:  map 92% reduce 0%
18/01/31 23:15:41 INFO mapreduce.Job:  map 96% reduce 0%
18/01/31 23:15:45 INFO mapreduce.Job:  map 100% reduce 0%
18/01/31 23:15:56 INFO mapreduce.Job:  map 100% reduce 100%
18/01/31 23:15:56 INFO mapreduce.Job: Job job_1517432822605_0001 completed successfully
18/01/31 23:15:58 INFO mapreduce.Job: Counters: 50
        File System Counters
                FILE: Number of bytes read=340
                FILE: Number of bytes written=1332418
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1912
                HDFS: Number of bytes written=8589934670
                HDFS: Number of read operations=35
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=10
        Job Counters
                Launched map tasks=8
                Launched reduce tasks=1
                Data-local map tasks=7
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=316702
                Total time spent by all reduces in occupied slots (ms)=12058
                Total time spent by all map tasks (ms)=316702
                Total time spent by all reduce tasks (ms)=12058
                Total vcore-milliseconds taken by all map tasks=316702
                Total vcore-milliseconds taken by all reduce tasks=12058
                Total megabyte-milliseconds taken by all map tasks=324302848
                Total megabyte-milliseconds taken by all reduce tasks=12347392
        Map-Reduce Framework
                Map input records=8
                Map output records=40
                Map output bytes=614
                Map output materialized bytes=808
                Input split bytes=1016
                Combine input records=0
                Combine output records=0
                Reduce input groups=5
                Reduce shuffle bytes=808
                Reduce input records=40
                Reduce output records=5
                Spilled Records=80
                Shuffled Maps =8
                Failed Shuffles=0
                Merged Map outputs=8
                GC time elapsed (ms)=4183
                CPU time spent (ms)=82630
                Physical memory (bytes) snapshot=3815903232
                Virtual memory (bytes) snapshot=24589758464
                Total committed heap usage (bytes)=3297673216
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=896
        File Output Format Counters
                Bytes Written=78
18/01/31 23:15:58 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
18/01/31 23:15:58 INFO fs.TestDFSIO:            Date & time: Wed Jan 31 23:15:58 IST 2018
18/01/31 23:15:58 INFO fs.TestDFSIO:        Number of files: 8
18/01/31 23:15:58 INFO fs.TestDFSIO: Total MBytes processed: 8192.0
18/01/31 23:15:58 INFO fs.TestDFSIO:      Throughput mb/sec: 34.33778623554414
18/01/31 23:15:58 INFO fs.TestDFSIO: Average IO rate mb/sec: 34.591529846191406
18/01/31 23:15:58 INFO fs.TestDFSIO:  IO rate std deviation: 2.9249713556373753
18/01/31 23:15:58 INFO fs.TestDFSIO:     Test exec time sec: 210.889
18/01/31 23:15:58 INFO fs.TestDFSIO:

First of all, you can see that it really does put HDFS to work:

Second, you can see the report at the bottom showing stats like Average IO rate (mb/s) and throughput (which are almost identical). The standard deviation can indicate if one or more nodes are considerably slower then the rest of the nodes in the cluster.

If you want to look at the files that were created during the test, they are in HDFS at “/benchmarks/TestDFSIO”:

[[email protected] ~]$ hdfs dfs -ls /benchmarks/TestDFSIO
Found 3 items
drwxr-xr-x - hdfs supergroup 0 2018-02-04 23:03 /benchmarks/TestDFSIO/io_control
drwxr-xr-x - hdfs supergroup 0 2018-02-04 23:05 /benchmarks/TestDFSIO/io_data
drwxr-xr-x - hdfs supergroup 0 2018-02-04 23:06 /benchmarks/TestDFSIO/io_write
[[email protected] ~]$ hdfs dfs -ls /benchmarks/TestDFSIO/io_data
Found 8 items
-rw-r--r-- 2 hdfs supergroup 1073741824 2018-02-04 23:04 /benchmarks/TestDFSIO/io_data/test_io_0
-rw-r--r-- 2 hdfs supergroup 1073741824 2018-02-04 23:04 /benchmarks/TestDFSIO/io_data/test_io_1
-rw-r--r-- 2 hdfs supergroup 1073741824 2018-02-04 23:04 /benchmarks/TestDFSIO/io_data/test_io_2
-rw-r--r-- 2 hdfs supergroup 1073741824 2018-02-04 23:05 /benchmarks/TestDFSIO/io_data/test_io_3
-rw-r--r-- 2 hdfs supergroup 1073741824 2018-02-04 23:05 /benchmarks/TestDFSIO/io_data/test_io_4
-rw-r--r-- 2 hdfs supergroup 1073741824 2018-02-04 23:05 /benchmarks/TestDFSIO/io_data/test_io_5
-rw-r--r-- 2 hdfs supergroup 1073741824 2018-02-04 23:06 /benchmarks/TestDFSIO/io_data/test_io_6
-rw-r--r-- 2 hdfs supergroup 1073741824 2018-02-04 23:06 /benchmarks/TestDFSIO/io_data/test_io_7

After you finished examining the results of TestDFSIO, you should run it again with the -clean option (instead of -read/-write) to cleanup any remaining files.

This, along with the previous post, covers the main benchmarking tools that come with Hadoop.

 

Posted in Hadoop | Tagged , , , | Leave a comment