HADOOP: Difference between revisions
m (→Performance) |
m (79 revisions imported) |
||
(69 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== HADOOP == | == HADOOP == | ||
== Links == | |||
* https://ccp.cloudera.com/display/DOC/Documentation | |||
* http://hadoop.apache.org/common/docs/current/ | |||
* http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html | |||
* https://ladd00.triumf.ca/hadoop/ | |||
== Management == | |||
=== Complete removal === | |||
<pre> | |||
638 xemacs -nw /etc/rc.local --- comment out all hdfs stuff | |||
639 umount /hdfs | |||
642 service hadoop-hdfs-datanode stop | |||
643 chkconfig --list | grep hdfs | |||
644 chkconfig --list | grep had | |||
650 rm /datah | |||
652 cd /iris_data1/ --- this is different on every data node | |||
656 rm -rf hdfs_data/ | |||
659 rpm -qa | grep -i hadoop | |||
660 yum erase hadoop | |||
666 rm -rf /etc/hadoop/ | |||
667 rm /etc/alternatives/hadoop-conf | |||
668 rm -rf /etc/hadoop"*" /etc/alternatives/hadoop"*" | |||
669 rm /etc/yum.repos.d/cloudera-cdh4.repo | |||
670 rm -rf /var/log/hadoop"*" | |||
671 rm -rf /var/lib/hadoop"*" | |||
672 userdel hdfs | |||
673 grep hdfs /etc/passwd | |||
674 sync | |||
</pre> | |||
=== Base install === | |||
<pre> | |||
cd /triumfcs/trshare/olchansk/linux/hadoop/ | |||
rpm --import RPM-GPG-KEY-cloudera | |||
rpm -vh --install cloudera-cdh-4-0.noarch.rpm | |||
yum install hadoop-hdfs hadoop-hdfs-fuse | |||
ln -s /home/olchansk/sysadm/hadoop/conf.daq_test /etc/hadoop/ | |||
alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.daq_test 50 | |||
alternatives --display hadoop-conf | |||
</pre> | |||
=== Create a name node === | |||
* yum install hadoop-hdfs-namenode | |||
* chkconfig hadoop-hdfs-namenode off | |||
* TBW | |||
* add to /etc/rc.local: firewall rules and service startup | |||
<pre> | |||
iptables -F hadoop-namenode | |||
iptables -N hadoop-namenode | |||
iptables -A hadoop-namenode -p tcp -s 142.90.0.0/16 --dport 8020 -j ACCEPT | |||
iptables -A hadoop-namenode -p tcp -s 142.90.0.0/16 --dport 50070 -j ACCEPT | |||
iptables -A hadoop-namenode -p tcp --dport 8020 -j REJECT | |||
iptables -A hadoop-namenode -p tcp --dport 50070 -j REJECT | |||
iptables -A INPUT -j hadoop-namenode | |||
iptables -L -v | |||
service hadoop-hdfs-namenode restart | |||
</pre> | |||
=== Create a data node === | === Create a data node === | ||
(only SL6 is supported. SL5 has problems with dead /hdfs mounts) | |||
* install data node software | * install data node software | ||
<pre> | <pre> | ||
yum install hadoop-hdfs-datanode | |||
chkconfig hadoop-hdfs-datanode off | |||
chkconfig hadoop- | |||
</pre> | </pre> | ||
* FIXME: adjust hadoop UID/GID somehow - it is different on every machine! wrong, wrong, wrong! | * FIXME: adjust hadoop UID/GID somehow - it is different on every machine! wrong, wrong, wrong! | ||
* configure data node | * configure data node (dataNNN is the local data disk) | ||
<pre> | <pre> | ||
mkdir /dataNNN/hdfs_data | |||
chown -R hdfs.hdfs /dataNNN/hdfs_data | |||
ln -s /dataNNN/hdfs_data /datah | |||
mkdir / | |||
chown -R hdfs.hdfs / | |||
</pre> | </pre> | ||
* login as root@nameserver (ladd12) | |||
** cd /etc/hadoop/conf | |||
** add new node to hosts | |||
** add new node to topology.perl | |||
** su - hdfs, run: hadoop dfsadmin -refreshNodes | |||
* login as root@ladd00 | |||
** add new node to /etc/httpd/conf.d/ssl.conf around "hadoop-50075" | |||
** service httpd restart | |||
* observe data node is listed as dead on the nameserver web page | |||
* add to /etc/rc.local: firewall rules, service startup and hdfs mount | |||
<pre> | |||
iptables -F hadoop | |||
iptables -N hadoop | |||
iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50010 -j ACCEPT | |||
iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50020 -j ACCEPT | |||
iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50075 -j ACCEPT | |||
iptables -A hadoop -p tcp --dport 50010 -j REJECT | |||
iptables -A hadoop -p tcp --dport 50020 -j REJECT | |||
iptables -A hadoop -p tcp --dport 50075 -j REJECT | |||
iptables -A INPUT -j hadoop | |||
iptables -L -v | |||
service hadoop-hdfs-datanode restart | |||
mkdir /hdfs | |||
hadoop-fuse-dfs dfs://ladd12:8020 /hdfs | |||
</pre> | |||
* start the data node: run /etc/rc.local | |||
* observe data node is now listed as live on the nameserver web page | |||
=== decommission a data node === | |||
* add node name to "hosts.exclude", should be typed exactly as it is entered in "hosts" | |||
* login hdfs@namenode | |||
* hdfs dfsadmin -refreshNodes | |||
* go to the hdfs web interface "Live Nodes" page, observe the data node status is "decommission in progress" | |||
* go to the hdfs web interface "Decommissioning Nodes", observe node is listed, observe "Under Replicated Blocks" count down to zero | |||
* after the node no longer listed in "Decommissioning Nodes" and is listed as "Decommissioned" in "Live Nodes", it can be shut down. | |||
=== create user directory === | |||
* login hdfs@namenode | |||
* hadoop fs -mkdir /users/trinat | |||
* hadoop fs -chown trinat /users/trinat | |||
=== rebalance the data === | |||
(does not work with) | |||
* login hdfs@namenode | |||
* hdfs balancer | |||
* watch the grass grow | |||
=== Setup quotas === | |||
* ssh hdfs@namenode | |||
* hdfs dfsadmin -setSpaceQuota 600g /users/trinat | |||
note: the quota applies to the size of files and their replicas. If default replication count is 3, the user can create 600g/3 = 200g worth of files. | |||
=== Report quotas === | |||
* (as any user) | |||
* hdfs dfs -count -q /users/"*" | |||
The reported numbers are not obvious, for example: | |||
<pre> | |||
ladd12:fs$ hdfs dfs -count -q /users/"*" | |||
none inf none inf 1 5 7809498 /users/lindner | |||
none inf none inf 1 1 130360111 /users/olchansk | |||
none inf 600000000000 208986905727 2 146 130337698091 /users/trinat | |||
</pre> | |||
Here: | |||
* 600000000000 is the value set by dfsadmin -setSpaceQuota | |||
* 130337698091 is the size of all files not counting replicas (same value as reported by "du -ks /hdfs/users/trinat") | |||
* 208986905727 = 600000000000 - 3*130337698091 is the remaining space couning replicas | |||
* actual remaining space is 130337698091/3 = 69662301909 (about 69 Gbytes) assuming replication count of 3. | |||
=== debug functions === | |||
* start the data node: service hadoop-hdfs-datanode restart | |||
* look at the log file: tail -100 /var/log/hadoop-hdfs/hadoop-hdfs-datanode-ladd12.triumf.ca.log | |||
* mount hdfs: unset LD_LIBRARY_PATH; hadoop-fuse-dfs dfs://ladd12:8020 /mnt/xxx | * mount hdfs: unset LD_LIBRARY_PATH; hadoop-fuse-dfs dfs://ladd12:8020 /mnt/xxx | ||
* watch name node: http://ladd12.triumf.ca:50070 | * watch name node: http://ladd12.triumf.ca:50070 | ||
* watch data node: http://ladd08.triumf.ca:50075 | * watch data node: http://ladd08.triumf.ca:50075 | ||
* report status and configuration: hdfs dfsadmin -report | |||
* report disk usage: hdfs dfs -count -q /users/"*" | |||
* report location of nodes: hdfs dfsadmin -printTopology | |||
* report everything about every file: hdfs fsck / -openforwrite -files -blocks -locations -racks | |||
== Performance == | == Performance == | ||
Line 51: | Line 198: | ||
0inputs+0outputs (0major+483minor)pagefaults 0swaps | 0inputs+0outputs (0major+483minor)pagefaults 0swaps | ||
</pre> | </pre> | ||
ganglia reports peak network use 30 Mbytes/sec, ladd12: load 1, cpu use 10%, disk use 15%+15%busy; ladd05: load 2, cpu use 30%sys, 30%wait, disk use 15%+20%; ladd08: load 6, cpu use 30%sys, 70%wait, disk use 40%+40%. | ganglia reports peak network use 30 Mbytes/sec, ladd12: load 1, cpu use 10%, disk use 15%+15%busy; ladd05: load 2, cpu use 30%sys, 30%wait, disk use 15%+20%busy; ladd08: load 6, cpu use 30%sys, 70%wait, disk use 40%+40%busy. | ||
* read benchmark: | * read benchmark: | ||
Line 81: | Line 228: | ||
20480000inputs+0outputs (0major+228minor)pagefaults 0swaps | 20480000inputs+0outputs (0major+228minor)pagefaults 0swaps | ||
</pre> | </pre> | ||
no network use, cpu use ladd12 is 10%+30%wait, disk use is 0% because all data was cached in memory: 10GB file, 16GB RAM. | |||
== Network ports == | |||
* data node ports: | |||
<pre> | |||
[root@ladd08 ~]# diff xxx1 xxx2 | grep LISTEN | |||
> tcp 0 0 :::50020 :::* LISTEN | |||
> tcp 0 0 :::47860 :::* LISTEN | |||
> tcp 0 0 :::50010 :::* LISTEN | |||
> tcp 0 0 :::50075 :::* LISTEN | |||
</pre> | |||
port 47860 is the java rmi server port, it is different on different machines and sometimes changes after restart. how to pin it to a fixed port number so I can firewall it?!? | |||
* name node: | |||
<pre> | |||
[root@ladd12 ~]# netstat -an | grep LISTEN | grep tcp > xxx2 | |||
[root@ladd12 ~]# diff xxx1 xxx2 | |||
< tcp 0 0 ::ffff:142.90.119.126:8020 :::* LISTEN | |||
< tcp 0 0 :::50070 :::* LISTEN | |||
< tcp 0 0 :::52745 :::* LISTEN | |||
</pre> | |||
port 52745 is probably also the java rmi server port... | |||
* name node: | |||
<pre> | |||
> lsof -P | grep LISTEN | grep java | |||
java 18096 hdfs 144u IPv4 1464177661 0t0 TCP *:44898 (LISTEN) | |||
java 18096 hdfs 155u IPv4 1464179424 0t0 TCP ladd12.triumf.ca:8020 (LISTEN) | |||
java 18096 hdfs 165u IPv4 1464180124 0t0 TCP *:50070 (LISTEN) | |||
java 18332 hdfs 144u IPv4 1464181247 0t0 TCP *:59445 (LISTEN) | |||
java 18332 hdfs 151u IPv4 1464182326 0t0 TCP *:50010 (LISTEN) | |||
java 18332 hdfs 152u IPv4 1464182357 0t0 TCP *:50075 (LISTEN) | |||
java 18332 hdfs 157u IPv4 1464182653 0t0 TCP *:50020 (LISTEN) | |||
</pre> | |||
port 44898 and 59445 are not documented. probably java rpm server ports |
Latest revision as of 09:19, 2 February 2022
HADOOP
Links
- https://ccp.cloudera.com/display/DOC/Documentation
- http://hadoop.apache.org/common/docs/current/
- http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
- https://ladd00.triumf.ca/hadoop/
Management
Complete removal
638 xemacs -nw /etc/rc.local --- comment out all hdfs stuff 639 umount /hdfs 642 service hadoop-hdfs-datanode stop 643 chkconfig --list | grep hdfs 644 chkconfig --list | grep had 650 rm /datah 652 cd /iris_data1/ --- this is different on every data node 656 rm -rf hdfs_data/ 659 rpm -qa | grep -i hadoop 660 yum erase hadoop 666 rm -rf /etc/hadoop/ 667 rm /etc/alternatives/hadoop-conf 668 rm -rf /etc/hadoop"*" /etc/alternatives/hadoop"*" 669 rm /etc/yum.repos.d/cloudera-cdh4.repo 670 rm -rf /var/log/hadoop"*" 671 rm -rf /var/lib/hadoop"*" 672 userdel hdfs 673 grep hdfs /etc/passwd 674 sync
Base install
cd /triumfcs/trshare/olchansk/linux/hadoop/ rpm --import RPM-GPG-KEY-cloudera rpm -vh --install cloudera-cdh-4-0.noarch.rpm yum install hadoop-hdfs hadoop-hdfs-fuse ln -s /home/olchansk/sysadm/hadoop/conf.daq_test /etc/hadoop/ alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.daq_test 50 alternatives --display hadoop-conf
Create a name node
- yum install hadoop-hdfs-namenode
- chkconfig hadoop-hdfs-namenode off
- TBW
- add to /etc/rc.local: firewall rules and service startup
iptables -F hadoop-namenode iptables -N hadoop-namenode iptables -A hadoop-namenode -p tcp -s 142.90.0.0/16 --dport 8020 -j ACCEPT iptables -A hadoop-namenode -p tcp -s 142.90.0.0/16 --dport 50070 -j ACCEPT iptables -A hadoop-namenode -p tcp --dport 8020 -j REJECT iptables -A hadoop-namenode -p tcp --dport 50070 -j REJECT iptables -A INPUT -j hadoop-namenode iptables -L -v service hadoop-hdfs-namenode restart
Create a data node
(only SL6 is supported. SL5 has problems with dead /hdfs mounts)
- install data node software
yum install hadoop-hdfs-datanode chkconfig hadoop-hdfs-datanode off
- FIXME: adjust hadoop UID/GID somehow - it is different on every machine! wrong, wrong, wrong!
- configure data node (dataNNN is the local data disk)
mkdir /dataNNN/hdfs_data chown -R hdfs.hdfs /dataNNN/hdfs_data ln -s /dataNNN/hdfs_data /datah
- login as root@nameserver (ladd12)
- cd /etc/hadoop/conf
- add new node to hosts
- add new node to topology.perl
- su - hdfs, run: hadoop dfsadmin -refreshNodes
- login as root@ladd00
- add new node to /etc/httpd/conf.d/ssl.conf around "hadoop-50075"
- service httpd restart
- observe data node is listed as dead on the nameserver web page
- add to /etc/rc.local: firewall rules, service startup and hdfs mount
iptables -F hadoop iptables -N hadoop iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50010 -j ACCEPT iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50020 -j ACCEPT iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50075 -j ACCEPT iptables -A hadoop -p tcp --dport 50010 -j REJECT iptables -A hadoop -p tcp --dport 50020 -j REJECT iptables -A hadoop -p tcp --dport 50075 -j REJECT iptables -A INPUT -j hadoop iptables -L -v service hadoop-hdfs-datanode restart mkdir /hdfs hadoop-fuse-dfs dfs://ladd12:8020 /hdfs
- start the data node: run /etc/rc.local
- observe data node is now listed as live on the nameserver web page
decommission a data node
- add node name to "hosts.exclude", should be typed exactly as it is entered in "hosts"
- login hdfs@namenode
- hdfs dfsadmin -refreshNodes
- go to the hdfs web interface "Live Nodes" page, observe the data node status is "decommission in progress"
- go to the hdfs web interface "Decommissioning Nodes", observe node is listed, observe "Under Replicated Blocks" count down to zero
- after the node no longer listed in "Decommissioning Nodes" and is listed as "Decommissioned" in "Live Nodes", it can be shut down.
create user directory
- login hdfs@namenode
- hadoop fs -mkdir /users/trinat
- hadoop fs -chown trinat /users/trinat
rebalance the data
(does not work with)
- login hdfs@namenode
- hdfs balancer
- watch the grass grow
Setup quotas
- ssh hdfs@namenode
- hdfs dfsadmin -setSpaceQuota 600g /users/trinat
note: the quota applies to the size of files and their replicas. If default replication count is 3, the user can create 600g/3 = 200g worth of files.
Report quotas
- (as any user)
- hdfs dfs -count -q /users/"*"
The reported numbers are not obvious, for example:
ladd12:fs$ hdfs dfs -count -q /users/"*" none inf none inf 1 5 7809498 /users/lindner none inf none inf 1 1 130360111 /users/olchansk none inf 600000000000 208986905727 2 146 130337698091 /users/trinat
Here:
- 600000000000 is the value set by dfsadmin -setSpaceQuota
- 130337698091 is the size of all files not counting replicas (same value as reported by "du -ks /hdfs/users/trinat")
- 208986905727 = 600000000000 - 3*130337698091 is the remaining space couning replicas
- actual remaining space is 130337698091/3 = 69662301909 (about 69 Gbytes) assuming replication count of 3.
debug functions
- start the data node: service hadoop-hdfs-datanode restart
- look at the log file: tail -100 /var/log/hadoop-hdfs/hadoop-hdfs-datanode-ladd12.triumf.ca.log
- mount hdfs: unset LD_LIBRARY_PATH; hadoop-fuse-dfs dfs://ladd12:8020 /mnt/xxx
- watch name node: http://ladd12.triumf.ca:50070
- watch data node: http://ladd08.triumf.ca:50075
- report status and configuration: hdfs dfsadmin -report
- report disk usage: hdfs dfs -count -q /users/"*"
- report location of nodes: hdfs dfsadmin -printTopology
- report everything about every file: hdfs fsck / -openforwrite -files -blocks -locations -racks
Performance
- cluster with 3 data nodes: ladd12 (quad-core i7-860), ladd08 (dual opteron 2GHz), ladd05 (dual-core Athlon 2.6 GHz)
- hdfs mounted on ladd12
- write benchmark:
ladd12:olchansk$ /usr/bin/time dd if=/dev/zero of=xxx bs=1024k count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 31.3406 s, 33.5 MB/s 0.00user 0.18system 0:31.43elapsed 0%CPU (0avgtext+0avgdata 7104maxresident)k 0inputs+0outputs (0major+483minor)pagefaults 0swaps ladd12:olchansk$ /usr/bin/time dd if=/dev/zero of=xxx2 bs=1024k count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 436.875 s, 24.0 MB/s 0.00user 1.72system 7:16.97elapsed 0%CPU (0avgtext+0avgdata 7104maxresident)k 0inputs+0outputs (0major+483minor)pagefaults 0swaps
ganglia reports peak network use 30 Mbytes/sec, ladd12: load 1, cpu use 10%, disk use 15%+15%busy; ladd05: load 2, cpu use 30%sys, 30%wait, disk use 15%+20%busy; ladd08: load 6, cpu use 30%sys, 70%wait, disk use 40%+40%busy.
- read benchmark:
ladd12:olchansk$ ls -l total 11391305 -rw-r--r-- 1 olchansk nobody 130360111 Dec 28 18:36 run21600sub00000.mid.gz -rw-r--r-- 1 olchansk nobody 1048576000 Jan 2 10:35 xxx -rw-r--r-- 1 olchansk nobody 10485760000 Jan 2 10:37 xxx2 ladd12:olchansk$ /usr/bin/time dd if=run21600sub00000.mid.gz of=/dev/null 254609+1 records in 254609+1 records out 130360111 bytes (130 MB) copied, 1.55497 s, 83.8 MB/s 0.02user 0.08system 0:01.69elapsed 6%CPU (0avgtext+0avgdata 3024maxresident)k 254632inputs+0outputs (1major+227minor)pagefaults 0swaps ladd12:olchansk$ ladd12:olchansk$ /usr/bin/time dd if=xxx of=/dev/null 2048000+0 records in 2048000+0 records out 1048576000 bytes (1.0 GB) copied, 11.4278 s, 91.8 MB/s 0.18user 0.69system 0:11.43elapsed 7%CPU (0avgtext+0avgdata 3008maxresident)k 2048000inputs+0outputs (0major+227minor)pagefaults 0swaps ladd12:olchansk$ ladd12:olchansk$ /usr/bin/time dd if=xxx2 of=/dev/null 20480000+0 records in 20480000+0 records out 10485760000 bytes (10 GB) copied, 130.188 s, 80.5 MB/s 1.68user 7.10system 2:10.19elapsed 6%CPU (0avgtext+0avgdata 3024maxresident)k 20480000inputs+0outputs (0major+228minor)pagefaults 0swaps
no network use, cpu use ladd12 is 10%+30%wait, disk use is 0% because all data was cached in memory: 10GB file, 16GB RAM.
Network ports
- data node ports:
[root@ladd08 ~]# diff xxx1 xxx2 | grep LISTEN > tcp 0 0 :::50020 :::* LISTEN > tcp 0 0 :::47860 :::* LISTEN > tcp 0 0 :::50010 :::* LISTEN > tcp 0 0 :::50075 :::* LISTEN
port 47860 is the java rmi server port, it is different on different machines and sometimes changes after restart. how to pin it to a fixed port number so I can firewall it?!?
- name node:
[root@ladd12 ~]# netstat -an | grep LISTEN | grep tcp > xxx2 [root@ladd12 ~]# diff xxx1 xxx2 < tcp 0 0 ::ffff:142.90.119.126:8020 :::* LISTEN < tcp 0 0 :::50070 :::* LISTEN < tcp 0 0 :::52745 :::* LISTEN
port 52745 is probably also the java rmi server port...
- name node:
> lsof -P | grep LISTEN | grep java java 18096 hdfs 144u IPv4 1464177661 0t0 TCP *:44898 (LISTEN) java 18096 hdfs 155u IPv4 1464179424 0t0 TCP ladd12.triumf.ca:8020 (LISTEN) java 18096 hdfs 165u IPv4 1464180124 0t0 TCP *:50070 (LISTEN) java 18332 hdfs 144u IPv4 1464181247 0t0 TCP *:59445 (LISTEN) java 18332 hdfs 151u IPv4 1464182326 0t0 TCP *:50010 (LISTEN) java 18332 hdfs 152u IPv4 1464182357 0t0 TCP *:50075 (LISTEN) java 18332 hdfs 157u IPv4 1464182653 0t0 TCP *:50020 (LISTEN)
port 44898 and 59445 are not documented. probably java rpm server ports