Sunday, July 7, 2013

Hadoop Rack Awareness (1.0.4)

To enable Hadoop rack awareness, you need to create a script that do the mapping and specify the path to this script using the topology.script.file.name property in core-site.xml file.

Below is the script i use which i actually obtain from this site. The script name is topology.sh
#!/bin/sh
HADOOP_CONF=/etc/hadoop
while [ $# -gt 0 ] ; do
  nodeArg=$1
  exec< ${HADOOP_CONF}/topology.data
  result=""
  while read line ; do
    ar=( $line )
    if [ "${ar[0]}" = "$nodeArg" ] ; then
      result="${ar[1]}"
    fi
  done
  shift
  if [ -z "$result" ] ; then
    echo -n "/default-rack "
  else
    echo -n "$result "
  fi
done

And the topology.data file is as shown below
10.0.0.11  /rack1
10.0.0.12  /rack1
10.0.0.13  /rack1
10.0.0.14  /rack1
10.0.0.15  /rack2
10.0.0.16  /rack2
10.0.0.17  /rack2
10.0.0.18  /rack2
10.0.0.19  /rack3
10.0.0.20  /rack3
10.0.0.21  /rack3
10.0.0.22  /rack3

Place these 2 files on /etc/hadoop folder on your namenode only (you can of course specify other directory but make sure you change the path information in the script file and core-site.xml file). After you done with this, you can proceed to add the topology.script.file.name property in the core-site.xml file. You only need to do this on namenode.

Once you done, you can restart Hadoop. (execute stop-all.sh follow by start-all.sh) To validate your cluster is indeed rack awareness, use this command: hadoop dfsadmin -report. You should be able to see the extra line showing Rack information for each datanode.
[hadoopuser@hadoop-name-node hadoop]$ hadoop dfsadmin -report
Configured Capacity: 5143534043136 (4.68 TB)
Present Capacity: 4881124573184 (4.44 TB)
DFS Remaining: 2865411211264 (2.61 TB)
DFS Used: 2015713361920 (1.83 TB)
DFS Used%: 41.3%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 12 (12 total, 0 dead)

Name: 10.0.0.12:50010
Rack: /rack1
Decommission Status : Normal
Configured Capacity: 428627836928 (379.19 GB)
DFS Used: 188051693553 (155.14 GB)
Non DFS Used: 21867446287 (40.37 GB)
DFS Remaining: 218708697088(183.69 GB)
DFS Used%: 43.87%
DFS Remaining%: 51.03%
Last contact: Mon Jul 08 09:44:33 SGT 2013


Name: 10.0.0.26:50010
Rack: /rack3
Decommission Status : Normal
Configured Capacity: 428627836928 (379.19 GB)
DFS Used: 166991044608 (135.52 GB)
Non DFS Used: 21867454464 (40.37 GB)
DFS Remaining: 239769337856(203.3 GB)
DFS Used%: 38.96%
DFS Remaining%: 55.94%
Last contact: Mon Jul 08 09:44:33 SGT 2013

No comments: