HADOOP
Jump to navigation
Jump to search
HADOOP
Create a data node
- install data node software
cd /triumfcs/trshare/olchansk/linux/hadoop/ rpm --import RPM-GPG-KEY-cloudera rpm -vh --install cdh3-repository-1.0-1.noarch.rpm-SL5 (or -SL6) (cd $HOME; sh /triumfcs/trshare/olchansk/linux/hadoop/jdk-6u30-linux-x64-rpm.bin) cd ~ yum install hadoop"*"datanode hadoop"*"fuse hadoop"*"native chkconfig hadoop-0.20-datanode off
- FIXME: adjust hadoop UID/GID somehow - it is different on every machine! wrong, wrong, wrong!
- configure data node
ln -s /home/olchansk/sysadm/hadoop/conf.daq_test /etc/hadoop-0.20 alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.daq_test 50 alternatives --display hadoop-0.20-conf mkdir /data8/hdfs_data chown -R hdfs.hdfs /data8/hdfs_data (add /data8/hdfs_data to /home/olchansk/sysadm/hadoop/conf.daq_test/hdfs-site.xml) service hadoop-0.20-datanode start tail -100 /usr/lib/hadoop-0.20/logs/hadoop-hadoop-datanode-ladd08.triumf.ca.log
- mount hdfs: unset LD_LIBRARY_PATH; hadoop-fuse-dfs dfs://ladd12:8020 /mnt/xxx
- watch name node: http://ladd12.triumf.ca:50070
- watch data node: http://ladd08.triumf.ca:50075
Performance
- cluster with 3 data nodes: ladd12 (quad-core i7-860), ladd08 (dual opteron 2GHz), ladd05 (dual-core Athlon 2.6 GHz)
- hdfs mounted on ladd12
- write benchmark:
ladd12:olchansk$ /usr/bin/time dd if=/dev/zero of=xxx bs=1024k count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 31.3406 s, 33.5 MB/s 0.00user 0.18system 0:31.43elapsed 0%CPU (0avgtext+0avgdata 7104maxresident)k 0inputs+0outputs (0major+483minor)pagefaults 0swaps ladd12:olchansk$ /usr/bin/time dd if=/dev/zero of=xxx2 bs=1024k count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 436.875 s, 24.0 MB/s 0.00user 1.72system 7:16.97elapsed 0%CPU (0avgtext+0avgdata 7104maxresident)k 0inputs+0outputs (0major+483minor)pagefaults 0swaps
ganglia reports peak network use 30 Mbytes/sec, ladd12: load 1, cpu use 10%, disk use 15%+15%busy; ladd05: load 2, cpu use 30%sys, 30%wait, disk use 15%+20%; ladd08: load 6, cpu use 30%sys, 70%wait, disk use 40%+40%.