Hadoop 3.0 cluster - installation, configuration, tests on Cent OS 7
In this article presented step by step how create Hadoop cluster with 1 name node and 3 slaves.
At first point there is one VM (from hardware administrator) with properties:
10.242.5.88
root/123456Qw
# cat /etc/centos-release
CentOS Linux release 7.1.1503 (Core)
# uname -a
Linux ders-hadoop1 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64
x86_64 x86_64 GNU/Linux
Storage information:
Файловая система Размер Использовано Дост Использовано% Cмонтировано в
/dev/mapper/centos-root 50G 914M 50G 2% /
devtmpfs 1,9G 0 1,9G 0% /dev
tmpfs 1,9G 0 1,9G 0% /dev/shm
tmpfs 1,9G 8,3M 1,9G 1% /run
tmpfs 1,9G 0 1,9G 0% /sys/fs/cgroup
/dev/mapper/centos-home 73G 33M 73G 1% /home
/dev/sda1 497M 119M 379M 24% /boot
CPU information:
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz
RAM information:
# cat /proc/meminfo
MemTotal: 3876056 kB
and network interfaces property
# ifconfig
eth0: flags=4163 mtu 1500
inet 10.242.5.88 netmask 255.255.255.0 broadcast 10.242.5.255
inet6 fe80::215:5dff:fe04:6f05 prefixlen 64 scopeid 0x20
ether 00:15:5d:04:6f:05 txqueuelen 1000 (Ethernet)
RX packets 46021 bytes 42368623 (40.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14164 bytes 1063856 (1.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73 mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 0 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# hostname
ders-hadoop1
The next step is setting ip and hostnames in /etc/hosts and install Java 8.
# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.242.5.88 hadoop-master hadoop-master
10.242.5.89 hadoop-slave-1 hadoop-slave-1
10.242.5.90 hadoop-slave-2 hadoop-slave-2
10.242.5.91 hadoop-slave-3 hadoop-slave-3
#vi /etc/hostname
write here hadoop-master
restart network
#systemctl restart network.service
and reboot VM
#reboot
Now we have correct hostname
# hostname
hadoop-master
and ready to install Java
cd /opt/
wget --no-cookies --no-check-certificate
--header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie"
"http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz"
or http://download.oracle.com/otn-pub/java/jdk/8u171-b11/512cd62ec5174c3487ac17c61aaa89e8/jdk-8u171-linux-x64.tar.gz
# ls -lh
итого 181M
-rw-r--r-- 1 root root 181M jdk-8u161-linux-x64.tar.gz
tar xzf jdk-8u161-linux-x64.tar.gz
If you don't have wget install it with: yum install wget
After extracting archive file use alternatives command to install it. alternatives command is available in chkconfig package.
# cd /opt/jdk1.8.0_161/
# alternatives --install /usr/bin/java java /opt/jdk1.8.0_161/bin/java 2
# alternatives --config java
Имеется 1 программа, которая предоставляет 'java'.
Выбор Команда
-----------------------------------------------
*+ 1 /opt/jdk1.8.0_161/bin/java
# alternatives --install /usr/bin/jar jar /opt/jdk1.8.0_161/bin/jar 2
# alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_161/bin/javac 2
# alternatives --set jar /opt/jdk1.8.0_161/bin/jar
# alternatives --set javac /opt/jdk1.8.0_161/bin/javac
# java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
# vi /etc/profile
and add next lines:
export JAVA_HOME=/opt/jdk1.8.0_161
export JRE_HOME=/opt/jdk1.8.0_161/jre
export PATH=$PATH:/opt/jdk1.8.0_161/bin:/opt/jdk1.8.0_161/jre/bin
pathmunge () {
...
and reboot server
Add hadoop user
# useradd hadoop
# passwd hadoop
Изменяется пароль пользователя hadoop.
Новый пароль :
НЕУДАЧНЫЙ ПАРОЛЬ: В пароле должно быть не меньше 8 символов
Повторите ввод нового пароля :
passwd: все данные аутентификации успешно обновлены.
Configuring Key Based Login It’s required to set up hadoop user to ssh itself without password.
# su - hadoop
[hadoop@hadoop-master ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:UT1YjUi5WO9ldVq2VN7/6KzPZMKMKa08JVVYo/eKhIY hadoop@hadoop-master
The key's randomart image is:
+---[RSA 2048]----+
| .oO+o o|
| .*.=..o*|
| .o.+...==|
| ..oo...+..|
| E S... o. .|
| ..o.*.. ..|
| .o= = + .|
| ..o B |
| o. .o= |
+----[SHA256]-----+
[hadoop@hadoop-master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-master
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'hadoop-master (10.242.5.88)' can't be established.
ECDSA key fingerprint is SHA256:3i32OhdNiKfXfaHUGHQP5dfb+9YHkDbjajRxYKmp8Do.
ECDSA key fingerprint is MD5:05:24:e7:9b:2f:a7:c4:9b:2a:ca:85:96:7b:67:1b:bc.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@hadoop-master's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@hadoop-master'"
and check to make sure that only the key(s) you wanted were added.
Download and Extract Hadoop Source
# cd opt
# wget http://apache-mirror.rbc.ru/pub/apache/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
# tar -xzf hadoop-3.0.0.tar.gz
# mv hadoop-3.0.0 hadoop
# chown -R hadoop /opt/hadoop
Configure Hadoop
[root@hadoop-master hadoop]# pwd
/opt/hadoop/etc/hadoop
[root@hadoop-master hadoop]# vi core-site.xml
here XML header ...
configuration
property
name=fs.default.name
value=hdfs://hadoop-master:9000/
property
name=dfs.permissions
value=false
/property
/configuration
Configure hdfs-site.xml and directories
# mkdir /opt/hadoop/dfs
# mkdir /opt/hadoop/dfs/name
# mkdir /opt/hadoop/dfs/data
# pwd
/opt/hadoop/etc/hadoop
# vi hdfs-site.xml
here XML header ...
configuration
property
name=dfs.data.dir
value=/opt/hadoop/dfs/data
final=true
/property
property
name=dfs.name.dir
value=/opt/hadoop/dfs/name
final=true
/property
property
name=dfs.replication
value=2
/configuration
Configure mapred-site.xml
# vi mapred-site.xml
here XML header ...
configuration
property
name=mapred.job.tracker
value=hadoop-master:9001
/property
/configuration
Edit hadoop-env.sh file
# vi hadoop-env.sh
# Set Hadoop-specific environment variables here.
export JAVA_HOME=/opt/jdk1.8.0_161
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
Now I ask our administrator make 3 clones of VM with ip:
10.242.5.89
10.242.5.90
10.242.5.91
On each clone edit
#vi /etc/hostname
write here slave correct hostname
restart network
#systemctl restart network.service
Now check that nodes can ping each other.
And the next step is checking SSH.
Configure slaves and workers on Name node only.
# su - hadoop
# pwd
# /opt/hadoop/etc/hadoop
# vi workers
write here 3 lines
hadoop-slave-1
hadoop-slave-2
hadoop-slave-3
#cp workers slaves
Setup Environment Variables
#su - hadoop
#vi ~/.bashrc
export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
and check hadoop version,
# source .bashrc
# hadoop version
Hadoop 3.0.0
Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533
Compiled by andrew on 2017-12-08T19:16Z
Compiled with protoc 2.5.0
From source with checksum 397832cb5529187dc8cd74ad54ff22
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-3.0.0.jar
Next cop .bashrc on all nodes. with
[hadoop@hadoop-master ~]$ rsync .bashrc hadoop@hadoop-slave-1:/home/hadoop
[hadoop@hadoop-master ~]$ rsync .bashrc hadoop@hadoop-slave-2:/home/hadoop
[hadoop@hadoop-master ~]$ rsync .bashrc hadoop@hadoop-slave-3:/home/hadoop
Make command on all nodes:
# chown -R hadoop /opt/hadoop
and in master execute:
# hadoop namenode -format
STARTUP_MSG: java = 1.8.0_161
************************************************************/
2018-02-19 16:08:45,031 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2018-02-19 16:08:45,045 INFO namenode.NameNode: createNameNode [-format]
2018-02-19 16:08:46,623 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-02-19 16:08:49,568 WARN common.Util: Path /opt/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
2018-02-19 16:08:49,570 WARN common.Util: Path /opt/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-34b701e3-4fd1-4680-beb7-3ac01da82a2a
2018-02-19 16:08:49,917 INFO namenode.FSEditLog: Edit logging is async:true
2018-02-19 16:08:50,052 INFO namenode.FSNamesystem: KeyProvider: null
2018-02-19 16:08:50,071 INFO namenode.FSNamesystem: fsLock is fair: true
2018-02-19 16:08:50,078 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
2018-02-19 16:08:50,103 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
2018-02-19 16:08:50,103 INFO namenode.FSNamesystem: supergroup = supergroup
2018-02-19 16:08:50,103 INFO namenode.FSNamesystem: isPermissionEnabled = true
2018-02-19 16:08:50,104 INFO namenode.FSNamesystem: HA Enabled: false
2018-02-19 16:08:50,229 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2018-02-19 16:08:50,270 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2018-02-19 16:08:50,270 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2018-02-19 16:08:50,294 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2018-02-19 16:08:50,294 INFO blockmanagement.BlockManager: The block deletion will start around 2018 фев 19 16:08:50
2018-02-19 16:08:50,299 INFO util.GSet: Computing capacity for map BlocksMap
2018-02-19 16:08:50,299 INFO util.GSet: VM type = 64-bit
2018-02-19 16:08:50,321 INFO util.GSet: 2.0% max memory 916.4 MB = 18.3 MB
2018-02-19 16:08:50,321 INFO util.GSet: capacity = 2^21 = 2097152 entries
2018-02-19 16:08:50,453 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2018-02-19 16:08:50,471 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
2018-02-19 16:08:50,471 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2018-02-19 16:08:50,471 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2018-02-19 16:08:50,471 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: defaultReplication = 2
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: maxReplication = 512
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: minReplication = 1
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: redundancyRecheckInterval = 3000ms
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: encryptDataTransfer = false
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
2018-02-19 16:08:50,968 INFO util.GSet: Computing capacity for map INodeMap
2018-02-19 16:08:50,968 INFO util.GSet: VM type = 64-bit
2018-02-19 16:08:50,969 INFO util.GSet: 1.0% max memory 916.4 MB = 9.2 MB
2018-02-19 16:08:50,969 INFO util.GSet: capacity = 2^20 = 1048576 entries
2018-02-19 16:08:50,971 INFO namenode.FSDirectory: ACLs enabled? false
2018-02-19 16:08:50,971 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2018-02-19 16:08:50,971 INFO namenode.FSDirectory: XAttrs enabled? true
2018-02-19 16:08:50,971 INFO namenode.NameNode: Caching file names occurring more than 10 times
2018-02-19 16:08:50,989 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, sna pshotDiffAllowSnapRootDescendant: true
2018-02-19 16:08:51,004 INFO util.GSet: Computing capacity for map cachedBlocks
2018-02-19 16:08:51,004 INFO util.GSet: VM type = 64-bit
2018-02-19 16:08:51,005 INFO util.GSet: 0.25% max memory 916.4 MB = 2.3 MB
2018-02-19 16:08:51,005 INFO util.GSet: capacity = 2^18 = 262144 entries
2018-02-19 16:08:51,053 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2018-02-19 16:08:51,053 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2018-02-19 16:08:51,053 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2018-02-19 16:08:51,084 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2018-02-19 16:08:51,084 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2018-02-19 16:08:51,089 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2018-02-19 16:08:51,089 INFO util.GSet: VM type = 64-bit
2018-02-19 16:08:51,090 INFO util.GSet: 0.029999999329447746% max memory 916.4 MB = 281.5 KB
2018-02-19 16:08:51,090 INFO util.GSet: capacity = 2^15 = 32768 entries
2018-02-19 16:08:51,268 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1839426364-10.242.5.88-1519045731213
2018-02-19 16:08:51,421 INFO common.Storage: Storage directory /opt/hadoop/dfs/name has been successfully formatted.
2018-02-19 16:08:51,482 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2018-02-19 16:08:52,004 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds.
2018-02-19 16:08:52,095 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2018-02-19 16:08:52,120 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/10.242.5.88
************************************************************/
Start cluster:
If etc/hadoop/workers and ssh trusted access is configured
(see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs:
#start-dfs.sh
Starting namenodes on [hadoop-master]
Starting datanodes
hadoop-slave-3: WARNING: /opt/hadoop/logs does not exist. Creating.
hadoop-slave-2: WARNING: /opt/hadoop/logs does not exist. Creating.
hadoop-slave-1: WARNING: /opt/hadoop/logs does not exist. Creating.
Starting secondary namenodes [hadoop-master]
2018-02-19 16:14:24,025 WARN util.NativeCodeLoader:
Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
Go to Web UI: http://10.242.5.88:9870
#hdfs dfsadmin -report
2018-02-19 16:45:18,724 WARN util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 160982630400 (149.93 GB)
Present Capacity: 154067443712 (143.49 GB)
DFS Remaining: 154067419136 (143.49 GB)
DFS Used: 24576 (24 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 10.242.5.89:9866 (hadoop-slave-1)
Hostname: hadoop-slave-1
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 2305089536 (2.15 GB)
DFS Remaining: 51355779072 (47.83 GB)
DFS Used%: 0.00%
DFS Remaining%: 95.70%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Feb 19 16:45:17 MSK 2018
Last Block Report: Mon Feb 19 16:44:14 MSK 2018
Name: 10.242.5.90:9866 (hadoop-slave-2)
Hostname: hadoop-slave-2
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 2304983040 (2.15 GB)
DFS Remaining: 51355885568 (47.83 GB)
DFS Used%: 0.00%
DFS Remaining%: 95.70%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Feb 19 16:45:17 MSK 2018
Last Block Report: Mon Feb 19 16:14:47 MSK 2018
Name: 10.242.5.91:9866 (hadoop-slave-3)
Hostname: hadoop-slave-3
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 2305114112 (2.15 GB)
DFS Remaining: 51355754496 (47.83 GB)
DFS Used%: 0.00%
DFS Remaining%: 95.70%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Feb 19 16:45:18 MSK 2018
Last Block Report: Mon Feb 19 16:14:23 MSK 2018
Комментарии
Отправить комментарий