HADOOP: Difference between revisions
m (→Network ports) |
|||
Line 109: | Line 109: | ||
=== rebalance the data === | === rebalance the data === | ||
(does not work with) | |||
* login hdfs@namenode | * login hdfs@namenode | ||
* hdfs balancer | * hdfs balancer | ||
* watch the grass grow | * watch the grass grow | ||
=== Setup quotas === | |||
* ssh hdfs@namenode | |||
* hdfs dfsadmin -setSpaceQuota 600g /users/trinat | |||
note: the quota applies to the size of files and their replicas. If default replication count is 3, the user can create 600g/3 = 200g worth of files. | |||
=== Report quotas === | |||
* (as any user) | |||
* hdfs dfs -count -q /users/"*" | |||
The reported numbers are not obvious, for example: | |||
<pre> | |||
ladd12:fs$ hdfs dfs -count -q /users/"*" | |||
none inf none inf 1 5 7809498 /users/lindner | |||
none inf none inf 1 1 130360111 /users/olchansk | |||
none inf 600000000000 208986905727 2 146 130337698091 /users/trinat | |||
</pre> | |||
Here: | |||
* 600000000000 is the value set by dfsadmin -setSpaceQuota | |||
* 130337698091 is the size of all files not counting replicas (same value as reported by "du -ks /hdfs/users/trinat") | |||
* 208986905727 = 600000000000 - 3*130337698091 is the remaining space counint replicas | |||
* actual remaining space is 130337698091/3 = 69662301909 (about 69 Gbytes) assuming replication count of 3. | |||
=== debug functions === | === debug functions === |
Revision as of 18:31, 8 July 2012
HADOOP
Links
- https://ccp.cloudera.com/display/DOC/Documentation
- http://hadoop.apache.org/common/docs/current/
- http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
- https://ladd00.triumf.ca/hadoop/
Management
Base install
cd /triumfcs/trshare/olchansk/linux/hadoop/ rpm --import RPM-GPG-KEY-cloudera rpm -vh --install cloudera-cdh-4-0.noarch.rpm yum install hadoop-hdfs hadoop-hdfs-fuse ln -s /home/olchansk/sysadm/hadoop/conf.daq_test /etc/hadoop/ alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.daq_test 50 alternatives --display hadoop-conf
Create a name node
- yum install hadoop-hdfs-namenode
- chkconfig hadoop-hdfs-namenode off
- TBW
- add to /etc/rc.local: firewall rules and service startup
iptables -F hadoop-namenode iptables -N hadoop-namenode iptables -A hadoop-namenode -p tcp -s 142.90.0.0/16 --dport 8020 -j ACCEPT iptables -A hadoop-namenode -p tcp -s 142.90.0.0/16 --dport 50070 -j ACCEPT iptables -A hadoop-namenode -p tcp --dport 8020 -j REJECT iptables -A hadoop-namenode -p tcp --dport 50070 -j REJECT iptables -A INPUT -j hadoop-namenode iptables -L -v service hadoop-hdfs-namenode restart
Create a data node
(only SL6 is supported. SL5 has problems with dead /hdfs mounts)
- install data node software
yum install hadoop-hdfs-datanode chkconfig hadoop-hdfs-datanode off
- FIXME: adjust hadoop UID/GID somehow - it is different on every machine! wrong, wrong, wrong!
- configure data node
mkdir /data8/hdfs_data chown -R hdfs.hdfs /data8/hdfs_data ln -s /data8/hdfs_data /datah
- add ladd08 to the hadoop cluster:
- cd /etc/hadoop/conf
- ### in hdfs-site.xml: add "/data8/hdfs_data" to dfs.data.dir
- ### in hdfs-site.xml: increase dfs.datanode.failed.volumes.tolerated by one (should be "number of dfs.data.dir entries" minus 1).
- add new node to hosts
- ### add ladd08 IP address to topology.data
- add new node to topology.perl
- ### restart the name server
- login to hdfs@nameserver, run: hadoop dfsadmin -refreshNodes
- observe data node is listed as dead on the nameserver web page
- add to /etc/rc.local: firewall rules, service startup and hdfs mount
iptables -F hadoop iptables -N hadoop iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50010 -j ACCEPT iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50020 -j ACCEPT iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50075 -j ACCEPT iptables -A hadoop -p tcp --dport 50010 -j REJECT iptables -A hadoop -p tcp --dport 50020 -j REJECT iptables -A hadoop -p tcp --dport 50075 -j REJECT iptables -A INPUT -j hadoop iptables -L -v service hadoop-hdfs-datanode restart mkdir /hdfs hadoop-fuse-dfs dfs://ladd12:8020 /hdfs
- start the data node: run /etc/rc.local
- observe data node is now listed as live on the nameserver web page
decommission a data node
- add node name to "hosts.exclude", should be typed exactly as it is entered in "hosts"
- login hdfs@namenode
- hdfs dfsadmin -refreshNodes
- go to the hdfs web interface "Live Nodes" page, observe the data node status is "decommission in progress"
- go to the hdfs web interface "Decommissioning Nodes", observe node is listed, observe "Under Replicated Blocks" count down to zero
- after the node no longer listed in "Decommissioning Nodes" and is listed as "Decommissioned" in "Live Nodes", it can be shut down.
create user directory
- login hdfs@namenode
- hadoop fs -mkdir /users/trinat
- hadoop fs -chown trinat /users/trinat
rebalance the data
(does not work with)
- login hdfs@namenode
- hdfs balancer
- watch the grass grow
Setup quotas
- ssh hdfs@namenode
- hdfs dfsadmin -setSpaceQuota 600g /users/trinat
note: the quota applies to the size of files and their replicas. If default replication count is 3, the user can create 600g/3 = 200g worth of files.
Report quotas
- (as any user)
- hdfs dfs -count -q /users/"*"
The reported numbers are not obvious, for example:
ladd12:fs$ hdfs dfs -count -q /users/"*" none inf none inf 1 5 7809498 /users/lindner none inf none inf 1 1 130360111 /users/olchansk none inf 600000000000 208986905727 2 146 130337698091 /users/trinat
Here:
- 600000000000 is the value set by dfsadmin -setSpaceQuota
- 130337698091 is the size of all files not counting replicas (same value as reported by "du -ks /hdfs/users/trinat")
- 208986905727 = 600000000000 - 3*130337698091 is the remaining space counint replicas
- actual remaining space is 130337698091/3 = 69662301909 (about 69 Gbytes) assuming replication count of 3.
debug functions
- start the data node: service hadoop-hdfs-datanode restart
- look at the log file: tail -100 /var/log/hadoop-hdfs/hadoop-hdfs-datanode-ladd12.triumf.ca.log
- mount hdfs: unset LD_LIBRARY_PATH; hadoop-fuse-dfs dfs://ladd12:8020 /mnt/xxx
- watch name node: http://ladd12.triumf.ca:50070
- watch data node: http://ladd08.triumf.ca:50075
- report status and configuration: hdfs dfsadmin -report
- report disk usage: hdfs dfs -count -q /users/"*"
- report location of nodes: hdfs dfsadmin -printTopology
- report everything about every file: hdfs fsck / -openforwrite -files -blocks -locations -racks
Performance
- cluster with 3 data nodes: ladd12 (quad-core i7-860), ladd08 (dual opteron 2GHz), ladd05 (dual-core Athlon 2.6 GHz)
- hdfs mounted on ladd12
- write benchmark:
ladd12:olchansk$ /usr/bin/time dd if=/dev/zero of=xxx bs=1024k count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 31.3406 s, 33.5 MB/s 0.00user 0.18system 0:31.43elapsed 0%CPU (0avgtext+0avgdata 7104maxresident)k 0inputs+0outputs (0major+483minor)pagefaults 0swaps ladd12:olchansk$ /usr/bin/time dd if=/dev/zero of=xxx2 bs=1024k count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 436.875 s, 24.0 MB/s 0.00user 1.72system 7:16.97elapsed 0%CPU (0avgtext+0avgdata 7104maxresident)k 0inputs+0outputs (0major+483minor)pagefaults 0swaps
ganglia reports peak network use 30 Mbytes/sec, ladd12: load 1, cpu use 10%, disk use 15%+15%busy; ladd05: load 2, cpu use 30%sys, 30%wait, disk use 15%+20%busy; ladd08: load 6, cpu use 30%sys, 70%wait, disk use 40%+40%busy.
- read benchmark:
ladd12:olchansk$ ls -l total 11391305 -rw-r--r-- 1 olchansk nobody 130360111 Dec 28 18:36 run21600sub00000.mid.gz -rw-r--r-- 1 olchansk nobody 1048576000 Jan 2 10:35 xxx -rw-r--r-- 1 olchansk nobody 10485760000 Jan 2 10:37 xxx2 ladd12:olchansk$ /usr/bin/time dd if=run21600sub00000.mid.gz of=/dev/null 254609+1 records in 254609+1 records out 130360111 bytes (130 MB) copied, 1.55497 s, 83.8 MB/s 0.02user 0.08system 0:01.69elapsed 6%CPU (0avgtext+0avgdata 3024maxresident)k 254632inputs+0outputs (1major+227minor)pagefaults 0swaps ladd12:olchansk$ ladd12:olchansk$ /usr/bin/time dd if=xxx of=/dev/null 2048000+0 records in 2048000+0 records out 1048576000 bytes (1.0 GB) copied, 11.4278 s, 91.8 MB/s 0.18user 0.69system 0:11.43elapsed 7%CPU (0avgtext+0avgdata 3008maxresident)k 2048000inputs+0outputs (0major+227minor)pagefaults 0swaps ladd12:olchansk$ ladd12:olchansk$ /usr/bin/time dd if=xxx2 of=/dev/null 20480000+0 records in 20480000+0 records out 10485760000 bytes (10 GB) copied, 130.188 s, 80.5 MB/s 1.68user 7.10system 2:10.19elapsed 6%CPU (0avgtext+0avgdata 3024maxresident)k 20480000inputs+0outputs (0major+228minor)pagefaults 0swaps
no network use, cpu use ladd12 is 10%+30%wait, disk use is 0% because all data was cached in memory: 10GB file, 16GB RAM.
Network ports
- data node ports:
[root@ladd08 ~]# diff xxx1 xxx2 | grep LISTEN > tcp 0 0 :::50020 :::* LISTEN > tcp 0 0 :::47860 :::* LISTEN > tcp 0 0 :::50010 :::* LISTEN > tcp 0 0 :::50075 :::* LISTEN
port 47860 is the java rmi server port, it is different on different machines and sometimes changes after restart. how to pin it to a fixed port number so I can firewall it?!?
- name node:
[root@ladd12 ~]# netstat -an | grep LISTEN | grep tcp > xxx2 [root@ladd12 ~]# diff xxx1 xxx2 < tcp 0 0 ::ffff:142.90.119.126:8020 :::* LISTEN < tcp 0 0 :::50070 :::* LISTEN < tcp 0 0 :::52745 :::* LISTEN
port 52745 is probably also the java rmi server port...
- name node:
> lsof -P | grep LISTEN | grep java java 18096 hdfs 144u IPv4 1464177661 0t0 TCP *:44898 (LISTEN) java 18096 hdfs 155u IPv4 1464179424 0t0 TCP ladd12.triumf.ca:8020 (LISTEN) java 18096 hdfs 165u IPv4 1464180124 0t0 TCP *:50070 (LISTEN) java 18332 hdfs 144u IPv4 1464181247 0t0 TCP *:59445 (LISTEN) java 18332 hdfs 151u IPv4 1464182326 0t0 TCP *:50010 (LISTEN) java 18332 hdfs 152u IPv4 1464182357 0t0 TCP *:50075 (LISTEN) java 18332 hdfs 157u IPv4 1464182653 0t0 TCP *:50020 (LISTEN)
port 44898 and 59445 are not documented. probably java rpm server ports