Hadoop 3.0 cluster - installation, configuration, tests on Cent OS 7


In this article presented step by step how create Hadoop cluster with 1 name node and 3 slaves.

At first point there is one VM (from hardware administrator) with properties:

       
10.242.5.88
root/123456Qw
 

       
# cat /etc/centos-release
CentOS Linux release 7.1.1503 (Core)

# uname -a
Linux ders-hadoop1 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 
x86_64 x86_64 GNU/Linux
       
 


Storage information:
       
Файловая система        Размер Использовано  Дост Использовано% Cмонтировано в
/dev/mapper/centos-root    50G         914M   50G            2% /
devtmpfs                  1,9G            0  1,9G            0% /dev
tmpfs                     1,9G            0  1,9G            0% /dev/shm
tmpfs                     1,9G         8,3M  1,9G            1% /run
tmpfs                     1,9G            0  1,9G            0% /sys/fs/cgroup
/dev/mapper/centos-home    73G          33M   73G            1% /home
/dev/sda1                 497M         119M  379M           24% /boot
       
 

CPU information:

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 60
model name      : Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz

 

RAM information:

# cat /proc/meminfo
MemTotal:        3876056 kB


and network interfaces property
       
# ifconfig
eth0: flags=4163  mtu 1500
        inet 10.242.5.88  netmask 255.255.255.0  broadcast 10.242.5.255
        inet6 fe80::215:5dff:fe04:6f05  prefixlen 64  scopeid 0x20
        ether 00:15:5d:04:6f:05  txqueuelen 1000  (Ethernet)
        RX packets 46021  bytes 42368623 (40.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14164  bytes 1063856 (1.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 0  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# hostname
ders-hadoop1

 

The next step is setting ip and hostnames in /etc/hosts and install Java 8.

# vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.242.5.88 hadoop-master hadoop-master
10.242.5.89 hadoop-slave-1 hadoop-slave-1
10.242.5.90 hadoop-slave-2 hadoop-slave-2
10.242.5.91 hadoop-slave-3 hadoop-slave-3

#vi /etc/hostname
write here hadoop-master

restart network
#systemctl restart network.service

and reboot VM
#reboot


Now we have correct hostname

# hostname
hadoop-master


and ready to install Java

cd /opt/
wget --no-cookies --no-check-certificate 
--header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" 
"http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz"

or http://download.oracle.com/otn-pub/java/jdk/8u171-b11/512cd62ec5174c3487ac17c61aaa89e8/jdk-8u171-linux-x64.tar.gz

# ls -lh
итого 181M
-rw-r--r-- 1 root root 181M jdk-8u161-linux-x64.tar.gz

tar xzf jdk-8u161-linux-x64.tar.gz

If you don't have wget install it with: yum install wget


After extracting archive file use alternatives command to install it. alternatives command is available in chkconfig package.


# cd /opt/jdk1.8.0_161/
# alternatives --install /usr/bin/java java /opt/jdk1.8.0_161/bin/java 2
# alternatives --config java

Имеется 1 программа, которая предоставляет 'java'.

  Выбор    Команда
-----------------------------------------------
*+ 1           /opt/jdk1.8.0_161/bin/java

# alternatives --install /usr/bin/jar jar /opt/jdk1.8.0_161/bin/jar 2
# alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_161/bin/javac 2
# alternatives --set jar /opt/jdk1.8.0_161/bin/jar
# alternatives --set javac /opt/jdk1.8.0_161/bin/javac

# java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

# vi /etc/profile
and add next lines:

export JAVA_HOME=/opt/jdk1.8.0_161
export JRE_HOME=/opt/jdk1.8.0_161/jre
export PATH=$PATH:/opt/jdk1.8.0_161/bin:/opt/jdk1.8.0_161/jre/bin

pathmunge () {
...

and reboot server

Add hadoop user

# useradd hadoop
# passwd hadoop
Изменяется пароль пользователя hadoop.
Новый пароль :
НЕУДАЧНЫЙ ПАРОЛЬ: В пароле должно быть не меньше 8 символов
Повторите ввод нового пароля :
passwd: все данные аутентификации успешно обновлены.


Configuring Key Based Login It’s required to set up hadoop user to ssh itself without password.
# su - hadoop
[hadoop@hadoop-master ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:UT1YjUi5WO9ldVq2VN7/6KzPZMKMKa08JVVYo/eKhIY hadoop@hadoop-master
The key's randomart image is:
+---[RSA 2048]----+
|         .oO+o  o|
|         .*.=..o*|
|        .o.+...==|
|       ..oo...+..|
|      E S... o. .|
|       ..o.*.. ..|
|        .o= = + .|
|       ..o   B   |
|        o.  .o=  |
+----[SHA256]-----+
[hadoop@hadoop-master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-master
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'hadoop-master (10.242.5.88)' can't be established.
ECDSA key fingerprint is SHA256:3i32OhdNiKfXfaHUGHQP5dfb+9YHkDbjajRxYKmp8Do.
ECDSA key fingerprint is MD5:05:24:e7:9b:2f:a7:c4:9b:2a:ca:85:96:7b:67:1b:bc.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@hadoop-master's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop@hadoop-master'"
and check to make sure that only the key(s) you wanted were added.
Download and Extract Hadoop Source

# cd opt
# wget http://apache-mirror.rbc.ru/pub/apache/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
# tar -xzf hadoop-3.0.0.tar.gz
# mv hadoop-3.0.0 hadoop
# chown -R hadoop /opt/hadoop

Configure Hadoop

[root@hadoop-master hadoop]# pwd
/opt/hadoop/etc/hadoop
[root@hadoop-master hadoop]# vi core-site.xml
here XML header ...
configuration

property
    name=fs.default.name
    value=hdfs://hadoop-master:9000/

property
    name=dfs.permissions
    value=false

/property
/configuration

Configure hdfs-site.xml and directories

# mkdir /opt/hadoop/dfs
# mkdir /opt/hadoop/dfs/name
# mkdir /opt/hadoop/dfs/data
# pwd
/opt/hadoop/etc/hadoop
# vi hdfs-site.xml

here XML header ...
configuration
property
        name=dfs.data.dir
        value=/opt/hadoop/dfs/data
        final=true
/property
property
        name=dfs.name.dir
        value=/opt/hadoop/dfs/name
        final=true
/property
property
        name=dfs.replication
        value=2
/configuration
Configure mapred-site.xml

# vi mapred-site.xml
here XML header ...
configuration
property
        name=mapred.job.tracker
        value=hadoop-master:9001
/property
/configuration
Edit hadoop-env.sh file

# vi hadoop-env.sh

# Set Hadoop-specific environment variables here.

export JAVA_HOME=/opt/jdk1.8.0_161
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop


Now I ask our administrator make 3 clones of VM with ip:

10.242.5.89
10.242.5.90
10.242.5.91
 


On each clone edit 

#vi /etc/hostname
write here slave correct hostname

restart network
#systemctl restart network.service

Now check that nodes can ping each other.
And the next step is checking SSH.

Configure slaves and workers on Name node only.

# su - hadoop
# pwd 
# /opt/hadoop/etc/hadoop
# vi workers

 write here 3 lines
hadoop-slave-1
hadoop-slave-2
hadoop-slave-3

#cp workers slaves



Setup Environment Variables

#su - hadoop
#vi ~/.bashrc

export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

and check hadoop version,

# source .bashrc
# hadoop version


Hadoop 3.0.0
Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533
Compiled by andrew on 2017-12-08T19:16Z
Compiled with protoc 2.5.0
From source with checksum 397832cb5529187dc8cd74ad54ff22
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-3.0.0.jar


Next cop .bashrc on all nodes. with

[hadoop@hadoop-master ~]$ rsync .bashrc hadoop@hadoop-slave-1:/home/hadoop
[hadoop@hadoop-master ~]$ rsync .bashrc hadoop@hadoop-slave-2:/home/hadoop
[hadoop@hadoop-master ~]$ rsync .bashrc hadoop@hadoop-slave-3:/home/hadoop



Make command on all nodes:
# chown -R hadoop /opt/hadoop

and in master execute:
# hadoop namenode -format

STARTUP_MSG:   java = 1.8.0_161
************************************************************/
2018-02-19 16:08:45,031 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2018-02-19 16:08:45,045 INFO namenode.NameNode: createNameNode [-format]
2018-02-19 16:08:46,623 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-02-19 16:08:49,568 WARN common.Util: Path /opt/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
2018-02-19 16:08:49,570 WARN common.Util: Path /opt/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-34b701e3-4fd1-4680-beb7-3ac01da82a2a
2018-02-19 16:08:49,917 INFO namenode.FSEditLog: Edit logging is async:true
2018-02-19 16:08:50,052 INFO namenode.FSNamesystem: KeyProvider: null
2018-02-19 16:08:50,071 INFO namenode.FSNamesystem: fsLock is fair: true
2018-02-19 16:08:50,078 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
2018-02-19 16:08:50,103 INFO namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
2018-02-19 16:08:50,103 INFO namenode.FSNamesystem: supergroup          = supergroup
2018-02-19 16:08:50,103 INFO namenode.FSNamesystem: isPermissionEnabled = true
2018-02-19 16:08:50,104 INFO namenode.FSNamesystem: HA Enabled: false
2018-02-19 16:08:50,229 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2018-02-19 16:08:50,270 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2018-02-19 16:08:50,270 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2018-02-19 16:08:50,294 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2018-02-19 16:08:50,294 INFO blockmanagement.BlockManager: The block deletion will start around 2018 фев 19 16:08:50
2018-02-19 16:08:50,299 INFO util.GSet: Computing capacity for map BlocksMap
2018-02-19 16:08:50,299 INFO util.GSet: VM type       = 64-bit
2018-02-19 16:08:50,321 INFO util.GSet: 2.0% max memory 916.4 MB = 18.3 MB
2018-02-19 16:08:50,321 INFO util.GSet: capacity      = 2^21 = 2097152 entries
2018-02-19 16:08:50,453 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2018-02-19 16:08:50,471 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
2018-02-19 16:08:50,471 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2018-02-19 16:08:50,471 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2018-02-19 16:08:50,471 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: defaultReplication         = 2
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: maxReplication             = 512
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: minReplication             = 1
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
2018-02-19 16:08:50,472 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2018-02-19 16:08:50,968 INFO util.GSet: Computing capacity for map INodeMap
2018-02-19 16:08:50,968 INFO util.GSet: VM type       = 64-bit
2018-02-19 16:08:50,969 INFO util.GSet: 1.0% max memory 916.4 MB = 9.2 MB
2018-02-19 16:08:50,969 INFO util.GSet: capacity      = 2^20 = 1048576 entries
2018-02-19 16:08:50,971 INFO namenode.FSDirectory: ACLs enabled? false
2018-02-19 16:08:50,971 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2018-02-19 16:08:50,971 INFO namenode.FSDirectory: XAttrs enabled? true
2018-02-19 16:08:50,971 INFO namenode.NameNode: Caching file names occurring more than 10 times
2018-02-19 16:08:50,989 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, sna                                                             pshotDiffAllowSnapRootDescendant: true
2018-02-19 16:08:51,004 INFO util.GSet: Computing capacity for map cachedBlocks
2018-02-19 16:08:51,004 INFO util.GSet: VM type       = 64-bit
2018-02-19 16:08:51,005 INFO util.GSet: 0.25% max memory 916.4 MB = 2.3 MB
2018-02-19 16:08:51,005 INFO util.GSet: capacity      = 2^18 = 262144 entries
2018-02-19 16:08:51,053 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2018-02-19 16:08:51,053 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2018-02-19 16:08:51,053 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2018-02-19 16:08:51,084 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2018-02-19 16:08:51,084 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2018-02-19 16:08:51,089 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2018-02-19 16:08:51,089 INFO util.GSet: VM type       = 64-bit
2018-02-19 16:08:51,090 INFO util.GSet: 0.029999999329447746% max memory 916.4 MB = 281.5 KB
2018-02-19 16:08:51,090 INFO util.GSet: capacity      = 2^15 = 32768 entries
2018-02-19 16:08:51,268 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1839426364-10.242.5.88-1519045731213
2018-02-19 16:08:51,421 INFO common.Storage: Storage directory /opt/hadoop/dfs/name has been successfully formatted.
2018-02-19 16:08:51,482 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2018-02-19 16:08:52,004 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds.
2018-02-19 16:08:52,095 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2018-02-19 16:08:52,120 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/10.242.5.88
************************************************************/



Start cluster:

If etc/hadoop/workers and ssh trusted access is configured 
(see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs:

#start-dfs.sh

Starting namenodes on [hadoop-master]
Starting datanodes
hadoop-slave-3: WARNING: /opt/hadoop/logs does not exist. Creating.
hadoop-slave-2: WARNING: /opt/hadoop/logs does not exist. Creating.
hadoop-slave-1: WARNING: /opt/hadoop/logs does not exist. Creating.
Starting secondary namenodes [hadoop-master]
2018-02-19 16:14:24,025 WARN util.NativeCodeLoader: 
Unable to load native-hadoop library for your platform... 
using builtin-java classes where applicable




Go to Web UI: http://10.242.5.88:9870




#hdfs dfsadmin -report 

2018-02-19 16:45:18,724 WARN util.NativeCodeLoader: 
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 160982630400 (149.93 GB)
Present Capacity: 154067443712 (143.49 GB)
DFS Remaining: 154067419136 (143.49 GB)
DFS Used: 24576 (24 KB)
DFS Used%: 0.00%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Pending deletion blocks: 0
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 10.242.5.89:9866 (hadoop-slave-1)
Hostname: hadoop-slave-1
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 2305089536 (2.15 GB)
DFS Remaining: 51355779072 (47.83 GB)
DFS Used%: 0.00%
DFS Remaining%: 95.70%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Feb 19 16:45:17 MSK 2018
Last Block Report: Mon Feb 19 16:44:14 MSK 2018


Name: 10.242.5.90:9866 (hadoop-slave-2)
Hostname: hadoop-slave-2
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 2304983040 (2.15 GB)
DFS Remaining: 51355885568 (47.83 GB)
DFS Used%: 0.00%
DFS Remaining%: 95.70%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Feb 19 16:45:17 MSK 2018
Last Block Report: Mon Feb 19 16:14:47 MSK 2018


Name: 10.242.5.91:9866 (hadoop-slave-3)
Hostname: hadoop-slave-3
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 2305114112 (2.15 GB)
DFS Remaining: 51355754496 (47.83 GB)
DFS Used%: 0.00%
DFS Remaining%: 95.70%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Feb 19 16:45:18 MSK 2018
Last Block Report: Mon Feb 19 16:14:23 MSK 2018

Комментарии

Популярные сообщения из этого блога

Loading data into Spark from Oracle RDBMS, CSV

Load data from Cassandra to HDFS parquet files and select with Hive