DataNodes
Our HDFS cluster is only 90% full but some datanodes have some disks that are 100% full. That means when we mass reboot the entire cluster some datanodes completely fail to start with a message like this:
DataNodes
for your case , balancing your data evenly over the cluster datanodes might help you to avoid disks getting full even if overall cluster has space you can force run hadoop oob balancer periodically.this will shuffle blocks so all datanodes will be consuming same amount of disk space
Master nodes must have a path.data directory whose contentspersist across restarts, just like data nodes, because this is where thecluster metadata is stored. The cluster metadata describes how to read the datastored on the data nodes, so if it is lost then the data stored on the datanodes cannot be read.
The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree. This information is stored persistently on the local disk in the form of two files: the namespace image and the edit log. The namenode also knows the datanodes on which all the blocks for a given file are located; however, it does not store block locations persistently, because this information is reconstructed from datanodes when the system starts.
A file may be splitted to many chunks and replications stored on many datanodes in HDFS. Now, the question is how to find the DataNodes that actually store a file in HDFS?Linux Common Commands: The hostname...Please enable JavaScript 041b061a72