This is not a full size post, but just a small and quick one that you may find helpful in some situations:
Every now and then you need to find the physical location of a data block (mainly on which data node it resides).
There is more than one way to get this information, but the one I use is HDFS WebUI. It works on Apache Hadoop and on Cloudera (didn’t check on other distributions).
You should point your browser to: http://[your active nameNode]:50070
You will see a web page like this:
Open the last menu item “utilities” and choose “browse the file system”.
This will show you all the directories in your HDFS. Browse until you get to the file you want:
You can also see that the replication factor for this block is 3.
Clicking the file name will bring up this window:
The block information drop down list enables you to choose which block of the file you want to see (If the file spans more than one block).
Then, under “availability”, you can see the list of data nodes that holds a copy of this block.