HADOOP: Difference between revisions

From DaqWiki
Jump to navigation Jump to search
m (79 revisions imported)
 
(49 intermediate revisions by the same user not shown)
Line 3: Line 3:
== Links ==
== Links ==


* http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html
* https://ccp.cloudera.com/display/DOC/Documentation
* http://archive.cloudera.com/cdh/3/hadoop/hdfs-default.html
* http://hadoop.apache.org/common/docs/current/
* http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
* https://ladd00.triumf.ca/hadoop/
* https://ladd00.triumf.ca/hadoop/


== Management ==
== Management ==
=== Complete removal ===
<pre>
  638  xemacs -nw /etc/rc.local --- comment out all hdfs stuff
  639  umount /hdfs
  642  service hadoop-hdfs-datanode stop
  643  chkconfig --list | grep hdfs
  644  chkconfig --list | grep had
  650  rm /datah
  652  cd /iris_data1/ --- this is different on every data node
  656  rm -rf hdfs_data/
  659  rpm -qa | grep -i hadoop
  660  yum erase hadoop
  666  rm -rf /etc/hadoop/
  667  rm /etc/alternatives/hadoop-conf
  668  rm -rf /etc/hadoop"*" /etc/alternatives/hadoop"*"
  669  rm /etc/yum.repos.d/cloudera-cdh4.repo
  670  rm -rf /var/log/hadoop"*"
  671  rm -rf /var/lib/hadoop"*"
  672  userdel hdfs
  673  grep hdfs /etc/passwd
  674  sync
</pre>
=== Base install ===
<pre>
cd /triumfcs/trshare/olchansk/linux/hadoop/
rpm --import RPM-GPG-KEY-cloudera
rpm -vh --install cloudera-cdh-4-0.noarch.rpm
yum install hadoop-hdfs hadoop-hdfs-fuse
ln -s /home/olchansk/sysadm/hadoop/conf.daq_test /etc/hadoop/
alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.daq_test 50
alternatives --display hadoop-conf
</pre>


=== Create a name node ===
=== Create a name node ===


* yum install hadoop-hdfs-namenode
* chkconfig hadoop-hdfs-namenode off
* TBW
* TBW


Line 23: Line 62:
iptables -A INPUT -j hadoop-namenode
iptables -A INPUT -j hadoop-namenode
iptables -L -v
iptables -L -v
service hadoop-0.20-namenode start
service hadoop-hdfs-namenode restart
</pre>
</pre>


Line 32: Line 71:
* install data node software
* install data node software
<pre>
<pre>
cd /triumfcs/trshare/olchansk/linux/hadoop/
yum install hadoop-hdfs-datanode
rpm --import RPM-GPG-KEY-cloudera
chkconfig hadoop-hdfs-datanode off
rpm -vh --install cdh3-repository-1.0-1.noarch.rpm-SL6
(cd $HOME; sh /triumfcs/trshare/olchansk/linux/hadoop/jdk-6u30-linux-x64-rpm.bin)
cd ~
yum install hadoop"*"datanode hadoop"*"fuse hadoop"*"native
chkconfig hadoop-0.20-datanode off
</pre>
</pre>


* FIXME: adjust hadoop UID/GID somehow - it is different on every machine! wrong, wrong, wrong!
* FIXME: adjust hadoop UID/GID somehow - it is different on every machine! wrong, wrong, wrong!


* configure data node
* configure data node (dataNNN is the local data disk)
<pre>
<pre>
ln -s /home/olchansk/sysadm/hadoop/conf.daq_test /etc/hadoop-0.20
mkdir /dataNNN/hdfs_data
alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.daq_test 50
chown -R hdfs.hdfs /dataNNN/hdfs_data
alternatives --display hadoop-0.20-conf
ln -s /dataNNN/hdfs_data /datah
mkdir /data8/hdfs_data
chown -R hdfs.hdfs /data8/hdfs_data
ln -s /data8/hdfs_data /datah
</pre>
</pre>


* add ladd08 to the hadoop cluster:
* login as root@nameserver (ladd12)
** cd /etc/hadoop/conf
** cd /etc/hadoop/conf
###** in hdfs-site.xml: add "/data8/hdfs_data" to dfs.data.dir
** add new node to hosts
###** in hdfs-site.xml: increase dfs.datanode.failed.volumes.tolerated by one (should be "number of dfs.data.dir entries" minus 1).
** add new node to topology.perl
** add "ladd08.triumf.ca" to hosts
** su - hdfs, run: hadoop dfsadmin -refreshNodes
** add ladd08 IP address to topology.data
 
###** restart the name server
* login as root@ladd00
** login to hdfs@nameserver, run hadoop hdfsAdmin -
** add new node to /etc/httpd/conf.d/ssl.conf around "hadoop-50075"
** service httpd restart
 
* observe data node is listed as dead on the nameserver web page


* add to /etc/rc.local: firewall rules, service startup and hdfs mount
* add to /etc/rc.local: firewall rules, service startup and hdfs mount
Line 74: Line 108:
iptables -A INPUT -j hadoop
iptables -A INPUT -j hadoop
iptables -L -v
iptables -L -v
service hadoop-0.20-datanode start
service hadoop-hdfs-datanode restart


mkdir /hdfs
mkdir /hdfs
unset LD_LIBRARY_PATH; hadoop-fuse-dfs dfs://ladd12:8020 /hdfs
hadoop-fuse-dfs dfs://ladd12:8020 /hdfs
</pre>
</pre>
* start the data node: run /etc/rc.local
* observe data node is now listed as live on the nameserver web page
=== decommission a data node ===
* add node name to "hosts.exclude", should be typed exactly as it is entered in "hosts"
* login hdfs@namenode
* hdfs dfsadmin -refreshNodes
* go to the hdfs web interface "Live Nodes" page, observe the data node status is "decommission in progress"
* go to the hdfs web interface "Decommissioning Nodes", observe node is listed, observe "Under Replicated Blocks" count down to zero
* after the node no longer listed in "Decommissioning Nodes" and is listed as "Decommissioned" in "Live Nodes", it can be shut down.
=== create user directory ===
* login hdfs@namenode
* hadoop fs -mkdir /users/trinat
* hadoop fs -chown trinat /users/trinat
=== rebalance the data ===
(does not work with)
* login hdfs@namenode
* hdfs balancer
* watch the grass grow
=== Setup quotas ===
* ssh hdfs@namenode
* hdfs dfsadmin -setSpaceQuota 600g /users/trinat
note: the quota applies to the size of files and their replicas. If default replication count is 3, the user can create 600g/3 = 200g worth of files.
=== Report quotas ===
* (as any user)
* hdfs dfs -count -q /users/"*"
The reported numbers are not obvious, for example:
<pre>
ladd12:fs$ hdfs dfs -count -q /users/"*"
        none            inf            none            inf            1            5            7809498 /users/lindner
        none            inf            none            inf            1            1          130360111 /users/olchansk
        none            inf    600000000000    208986905727            2          146      130337698091 /users/trinat
</pre>
Here:
* 600000000000 is the value set by dfsadmin -setSpaceQuota
* 130337698091 is the size of all files not counting replicas (same value as reported by "du -ks /hdfs/users/trinat")
* 208986905727 = 600000000000 - 3*130337698091 is the remaining space couning replicas
* actual remaining space is 130337698091/3 = 69662301909 (about 69 Gbytes) assuming replication count of 3.


=== debug functions ===
=== debug functions ===


* start the data node: service hadoop-0.20-datanode restart
* start the data node: service hadoop-hdfs-datanode restart
* look at the log file: tail -100 /usr/lib/hadoop-0.20/logs/hadoop-hadoop-datanode-ladd08.triumf.ca.log
* look at the log file: tail -100 /var/log/hadoop-hdfs/hadoop-hdfs-datanode-ladd12.triumf.ca.log
* mount hdfs: unset LD_LIBRARY_PATH; hadoop-fuse-dfs dfs://ladd12:8020 /mnt/xxx
* mount hdfs: unset LD_LIBRARY_PATH; hadoop-fuse-dfs dfs://ladd12:8020 /mnt/xxx
* watch name node: http://ladd12.triumf.ca:50070
* watch name node: http://ladd12.triumf.ca:50070
* watch data node: http://ladd08.triumf.ca:50075
* watch data node: http://ladd08.triumf.ca:50075
* report status and configuration: hdfs dfsadmin -report
* report disk usage: hdfs dfs -count -q /users/"*"
* report location of nodes: hdfs dfsadmin -printTopology
* report everything about every file: hdfs fsck / -openforwrite -files -blocks -locations -racks


== Performance ==
== Performance ==
Line 163: Line 254:


port 52745 is probably also the java rmi server port...
port 52745 is probably also the java rmi server port...
* name node:
<pre>
> lsof -P | grep LISTEN | grep java
java      18096      hdfs  144u    IPv4        1464177661      0t0        TCP *:44898 (LISTEN)
java      18096      hdfs  155u    IPv4        1464179424      0t0        TCP ladd12.triumf.ca:8020 (LISTEN)
java      18096      hdfs  165u    IPv4        1464180124      0t0        TCP *:50070 (LISTEN)
java      18332      hdfs  144u    IPv4        1464181247      0t0        TCP *:59445 (LISTEN)
java      18332      hdfs  151u    IPv4        1464182326      0t0        TCP *:50010 (LISTEN)
java      18332      hdfs  152u    IPv4        1464182357      0t0        TCP *:50075 (LISTEN)
java      18332      hdfs  157u    IPv4        1464182653      0t0        TCP *:50020 (LISTEN)
</pre>
port 44898 and 59445 are not documented. probably java rpm server ports

Latest revision as of 09:19, 2 February 2022

HADOOP

Links

Management

Complete removal

  638  xemacs -nw /etc/rc.local --- comment out all hdfs stuff
  639  umount /hdfs
  642  service hadoop-hdfs-datanode stop
  643  chkconfig --list | grep hdfs
  644  chkconfig --list | grep had
  650  rm /datah
  652  cd /iris_data1/ --- this is different on every data node
  656  rm -rf hdfs_data/
  659  rpm -qa | grep -i hadoop
  660  yum erase hadoop
  666  rm -rf /etc/hadoop/
  667  rm /etc/alternatives/hadoop-conf 
  668  rm -rf /etc/hadoop"*" /etc/alternatives/hadoop"*"
  669  rm /etc/yum.repos.d/cloudera-cdh4.repo
  670  rm -rf /var/log/hadoop"*"
  671  rm -rf /var/lib/hadoop"*"
  672  userdel hdfs
  673  grep hdfs /etc/passwd
  674  sync

Base install

cd /triumfcs/trshare/olchansk/linux/hadoop/
rpm --import RPM-GPG-KEY-cloudera
rpm -vh --install cloudera-cdh-4-0.noarch.rpm
yum install hadoop-hdfs hadoop-hdfs-fuse
ln -s /home/olchansk/sysadm/hadoop/conf.daq_test /etc/hadoop/
alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.daq_test 50
alternatives --display hadoop-conf

Create a name node

  • yum install hadoop-hdfs-namenode
  • chkconfig hadoop-hdfs-namenode off
  • TBW
  • add to /etc/rc.local: firewall rules and service startup
iptables -F hadoop-namenode
iptables -N hadoop-namenode
iptables -A hadoop-namenode -p tcp -s 142.90.0.0/16 --dport 8020 -j ACCEPT
iptables -A hadoop-namenode -p tcp -s 142.90.0.0/16 --dport 50070 -j ACCEPT
iptables -A hadoop-namenode -p tcp --dport 8020 -j REJECT
iptables -A hadoop-namenode -p tcp --dport 50070 -j REJECT
iptables -A INPUT -j hadoop-namenode
iptables -L -v
service hadoop-hdfs-namenode restart

Create a data node

(only SL6 is supported. SL5 has problems with dead /hdfs mounts)

  • install data node software
yum install hadoop-hdfs-datanode
chkconfig hadoop-hdfs-datanode off
  • FIXME: adjust hadoop UID/GID somehow - it is different on every machine! wrong, wrong, wrong!
  • configure data node (dataNNN is the local data disk)
mkdir /dataNNN/hdfs_data
chown -R hdfs.hdfs /dataNNN/hdfs_data
ln -s /dataNNN/hdfs_data /datah
  • login as root@nameserver (ladd12)
    • cd /etc/hadoop/conf
    • add new node to hosts
    • add new node to topology.perl
    • su - hdfs, run: hadoop dfsadmin -refreshNodes
  • login as root@ladd00
    • add new node to /etc/httpd/conf.d/ssl.conf around "hadoop-50075"
    • service httpd restart
  • observe data node is listed as dead on the nameserver web page
  • add to /etc/rc.local: firewall rules, service startup and hdfs mount
iptables -F hadoop
iptables -N hadoop
iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50010 -j ACCEPT
iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50020 -j ACCEPT
iptables -A hadoop -p tcp -s 142.90.0.0/16 --dport 50075 -j ACCEPT
iptables -A hadoop -p tcp --dport 50010 -j REJECT
iptables -A hadoop -p tcp --dport 50020 -j REJECT
iptables -A hadoop -p tcp --dport 50075 -j REJECT
iptables -A INPUT -j hadoop
iptables -L -v
service hadoop-hdfs-datanode restart

mkdir /hdfs
hadoop-fuse-dfs dfs://ladd12:8020 /hdfs
  • start the data node: run /etc/rc.local
  • observe data node is now listed as live on the nameserver web page

decommission a data node

  • add node name to "hosts.exclude", should be typed exactly as it is entered in "hosts"
  • login hdfs@namenode
  • hdfs dfsadmin -refreshNodes
  • go to the hdfs web interface "Live Nodes" page, observe the data node status is "decommission in progress"
  • go to the hdfs web interface "Decommissioning Nodes", observe node is listed, observe "Under Replicated Blocks" count down to zero
  • after the node no longer listed in "Decommissioning Nodes" and is listed as "Decommissioned" in "Live Nodes", it can be shut down.

create user directory

  • login hdfs@namenode
  • hadoop fs -mkdir /users/trinat
  • hadoop fs -chown trinat /users/trinat

rebalance the data

(does not work with)

  • login hdfs@namenode
  • hdfs balancer
  • watch the grass grow

Setup quotas

  • ssh hdfs@namenode
  • hdfs dfsadmin -setSpaceQuota 600g /users/trinat

note: the quota applies to the size of files and their replicas. If default replication count is 3, the user can create 600g/3 = 200g worth of files.

Report quotas

  • (as any user)
  • hdfs dfs -count -q /users/"*"

The reported numbers are not obvious, for example:

ladd12:fs$ hdfs dfs -count -q /users/"*"
        none             inf            none             inf            1            5            7809498 /users/lindner
        none             inf            none             inf            1            1          130360111 /users/olchansk
        none             inf    600000000000    208986905727            2          146       130337698091 /users/trinat

Here:

  • 600000000000 is the value set by dfsadmin -setSpaceQuota
  • 130337698091 is the size of all files not counting replicas (same value as reported by "du -ks /hdfs/users/trinat")
  • 208986905727 = 600000000000 - 3*130337698091 is the remaining space couning replicas
  • actual remaining space is 130337698091/3 = 69662301909 (about 69 Gbytes) assuming replication count of 3.

debug functions

  • start the data node: service hadoop-hdfs-datanode restart
  • look at the log file: tail -100 /var/log/hadoop-hdfs/hadoop-hdfs-datanode-ladd12.triumf.ca.log
  • mount hdfs: unset LD_LIBRARY_PATH; hadoop-fuse-dfs dfs://ladd12:8020 /mnt/xxx
  • watch name node: http://ladd12.triumf.ca:50070
  • watch data node: http://ladd08.triumf.ca:50075
  • report status and configuration: hdfs dfsadmin -report
  • report disk usage: hdfs dfs -count -q /users/"*"
  • report location of nodes: hdfs dfsadmin -printTopology
  • report everything about every file: hdfs fsck / -openforwrite -files -blocks -locations -racks

Performance

  • cluster with 3 data nodes: ladd12 (quad-core i7-860), ladd08 (dual opteron 2GHz), ladd05 (dual-core Athlon 2.6 GHz)
  • hdfs mounted on ladd12
  • write benchmark:
ladd12:olchansk$ /usr/bin/time dd if=/dev/zero of=xxx bs=1024k count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 31.3406 s, 33.5 MB/s
0.00user 0.18system 0:31.43elapsed 0%CPU (0avgtext+0avgdata 7104maxresident)k
0inputs+0outputs (0major+483minor)pagefaults 0swaps
ladd12:olchansk$ /usr/bin/time dd if=/dev/zero of=xxx2 bs=1024k count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 436.875 s, 24.0 MB/s
0.00user 1.72system 7:16.97elapsed 0%CPU (0avgtext+0avgdata 7104maxresident)k
0inputs+0outputs (0major+483minor)pagefaults 0swaps

ganglia reports peak network use 30 Mbytes/sec, ladd12: load 1, cpu use 10%, disk use 15%+15%busy; ladd05: load 2, cpu use 30%sys, 30%wait, disk use 15%+20%busy; ladd08: load 6, cpu use 30%sys, 70%wait, disk use 40%+40%busy.

  • read benchmark:
ladd12:olchansk$ ls -l
total 11391305
-rw-r--r-- 1 olchansk nobody   130360111 Dec 28 18:36 run21600sub00000.mid.gz
-rw-r--r-- 1 olchansk nobody  1048576000 Jan  2 10:35 xxx
-rw-r--r-- 1 olchansk nobody 10485760000 Jan  2 10:37 xxx2
ladd12:olchansk$ /usr/bin/time dd if=run21600sub00000.mid.gz of=/dev/null
254609+1 records in
254609+1 records out
130360111 bytes (130 MB) copied, 1.55497 s, 83.8 MB/s
0.02user 0.08system 0:01.69elapsed 6%CPU (0avgtext+0avgdata 3024maxresident)k
254632inputs+0outputs (1major+227minor)pagefaults 0swaps
ladd12:olchansk$ 
ladd12:olchansk$ /usr/bin/time dd if=xxx of=/dev/null
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 11.4278 s, 91.8 MB/s
0.18user 0.69system 0:11.43elapsed 7%CPU (0avgtext+0avgdata 3008maxresident)k
2048000inputs+0outputs (0major+227minor)pagefaults 0swaps
ladd12:olchansk$ 
ladd12:olchansk$ /usr/bin/time dd if=xxx2 of=/dev/null
20480000+0 records in
20480000+0 records out
10485760000 bytes (10 GB) copied, 130.188 s, 80.5 MB/s
1.68user 7.10system 2:10.19elapsed 6%CPU (0avgtext+0avgdata 3024maxresident)k
20480000inputs+0outputs (0major+228minor)pagefaults 0swaps

no network use, cpu use ladd12 is 10%+30%wait, disk use is 0% because all data was cached in memory: 10GB file, 16GB RAM.


Network ports

  • data node ports:
[root@ladd08 ~]# diff xxx1 xxx2 | grep LISTEN
> tcp        0      0 :::50020                    :::*                        LISTEN      
> tcp        0      0 :::47860                    :::*                        LISTEN      
> tcp        0      0 :::50010                    :::*                        LISTEN      
> tcp        0      0 :::50075                    :::*                        LISTEN      

port 47860 is the java rmi server port, it is different on different machines and sometimes changes after restart. how to pin it to a fixed port number so I can firewall it?!?

  • name node:
[root@ladd12 ~]# netstat -an | grep LISTEN | grep tcp > xxx2
[root@ladd12 ~]# diff xxx1 xxx2
< tcp        0      0 ::ffff:142.90.119.126:8020  :::*                        LISTEN      
< tcp        0      0 :::50070                    :::*                        LISTEN      
< tcp        0      0 :::52745                    :::*                        LISTEN      

port 52745 is probably also the java rmi server port...

  • name node:
> lsof -P | grep LISTEN | grep java
java      18096      hdfs  144u     IPv4         1464177661      0t0        TCP *:44898 (LISTEN)
java      18096      hdfs  155u     IPv4         1464179424      0t0        TCP ladd12.triumf.ca:8020 (LISTEN)
java      18096      hdfs  165u     IPv4         1464180124      0t0        TCP *:50070 (LISTEN)
java      18332      hdfs  144u     IPv4         1464181247      0t0        TCP *:59445 (LISTEN)
java      18332      hdfs  151u     IPv4         1464182326      0t0        TCP *:50010 (LISTEN)
java      18332      hdfs  152u     IPv4         1464182357      0t0        TCP *:50075 (LISTEN)
java      18332      hdfs  157u     IPv4         1464182653      0t0        TCP *:50020 (LISTEN)

port 44898 and 59445 are not documented. probably java rpm server ports