Install Hive 2.3.2 on Hadoop (3.0.0) NameNode. Hive metastore on external postgres database.
previous post (Install and configure Hadoop 3 cluster) This cluster NameNode is using in next articles.
1) Download and extract hive binary
1) Download and extract hive binary
cd /opt
wget http://apache-mirror.rbc.ru/pub/apache/hive/hive-2.3.2/apache-hive-2.3.2-bin.tar.gz
tar -xvf apache-hive-2.3.2-bin.tar.gz
mv apache-hive-2.3.2-bin hive
chown -R hadoop /opt/hive
Configure environment in (hadoop user) home directory edit file .bashrc
hadoop# cd ~
vi .bashrc
now it looks like:
# .bashrc
export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HIVE_HOME=/opt/hive
export PATH=$PATH:$HIVE_HOME/bin
...
...
We have added HIVE_HOME and modify PATH.
Don't forget execute
source .bashrc
after exit from edit .bashrc
wget http://central.maven.org/maven2/org/apache/parquet/parquet-hive-bundle/1.9.0/parquet-hive-bundle-1.9.0.jar
cp parquet-hive-bundle-1.9.0.jar /opt/hive/lib
Now we need configure hive metastore database:
The Hive metastore service stores the metadata for Hive tables and partitions in a relational database, and provides clients (including Hive) access to this information using the metastore service API.
I will use "Local Mode", where Hive metastore service runs in the same process as the main HiveServer process, but the metastore database runs in a separate host (10.242.5.62). The embedded metastore service communicates with the metastore database over JDBC (You can find this library in /opt/hive/lib/).
[hadoop@hadoop-master conf]$ pwd
/opt/hive/conf
[hadoop@hadoop-master conf]$ ls /opt/hive/lib | grep postgr
postgresql-9.4.1208.jre7.jar
postgresql-metadata-storage-0.9.2.jar
Now we configure Hive,
[hadoop@hadoop-master conf]$ pwd
/opt/hive/conf
[hadoop@hadoop-master conf]$ vi hive-site.xml
and make next changes:
configuration
property
name javax.jdo.option.ConnectionURL
value jdbc:postgresql://10.242.5.62:5432/hive_meta?createDatabaseIfNotExist=true
description JDBC connection string used by Hive Metastore
/property
property
name javax.jdo.option.ConnectionDriverName
value org.postgresql.Driver
description JDBC Driver class
/property
property
name javax.jdo.option.ConnectionUserName
value hive
description Metastore database user name
/property
property
name javax.jdo.option.ConnectionPassword
value hive
description Metastore database password
/property
property
name hive.metastore.uris
value thrift://10.242.5.88:9084
description Thrift server hostname and port
/property
property
name hive.server2.use.SSL
value false
/property
property
name hive.server2.authentication
value NONE
/property
...
...
Go to you prefer IDE for RDBMS connect to server 10.242.5.62:5432 and create new database hive_meta. Create role hive with password hive and add priveleges to database hive_meta. Go back and execute
# schematool -dbType postgres -initSchema
You will see that it creates tables in database hive_meta
[hadoop@hadoop-master root]$ whoami
hadoop
[hadoop@hadoop-master root]$ hive --service metastore
2018-03-07 10:51:48: Starting Hive Metastore Server
...
/************************************************************
STARTUP_MSG: Starting HiveMetaStore
STARTUP_MSG: host = hadoop-master/10.242.5.88
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.3.2
STARTUP_MSG: classpath = /opt/hive/conf:/opt/hive/lib.......
...
STARTUP_MSG: build = git://stakiar-MBP.local/Users/stakiar/Desktop/scratch-space/apache-hive -r 857a9fd8ad725a53bd95c1b2d6612f9b1155f44d; compiled by 'stakiar' on Thu Nov 9 09:11:39 PST 2017
************************************************************/
2018-03-07T10:51:57,659 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Starting hive metastore on port 9084
2018-03-07T10:51:57,837 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2018-03-07T10:52:09,935 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Added admin role in metastore
2018-03-07T10:52:09,970 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Added public role in metastore
2018-03-07T10:52:10,079 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - No user is added in admin role, since config is empty
2018-03-07T10:52:10,978 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Starting DB backed MetaStore Server with SetUGI enabled
2018-03-07T10:52:11,026 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Started the new metaserver on port [9084]...
2018-03-07T10:52:11,026 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Options.minWorkerThreads = 200
2018-03-07T10:52:11,026 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Options.maxWorkerThreads = 1000
2018-03-07T10:52:11,026 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - TCP keepalive = true
Next open new ssh console on Hadoop NameNode 10.242.5.88 (Where we run Hive) And start hiveserver2
[hadoop@hadoop-master root]$ whoami
hadoop
[hadoop@hadoop-master root]$ hive --service hiveserver2
which: no hbase in (/sbin:/bin:/usr/sbin:/usr/bin:/opt/hadoop/sbin:/opt/hadoop/bin:/opt/hive/bin:/opt/maven/bin)
2018-03-07 10:56:16: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Ok, next we can connect to Hive "database" with DBeaver or any IDE. I will use DBeaver.
While query is executing in DBeaver you can see executing stages in console where you run hiveserver2 (right part on image)
Also possible to see execution plan.
Addition there are Hive web panel and CLI (command line interface) interface:
http://10.242.5.88:10002
[hadoop@hadoop-master conf]$ whoami
hadoop
[hadoop@hadoop-master conf]$ hive
which: no hbase in (/sbin:/bin:/usr/sbin:/usr/bin:/opt/hadoop/sbin:/opt/hadoop/bin:/opt/hive/bin:/opt/maven/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.cla ss]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show tables;
OK
hive_cc1
Time taken: 3.544 seconds, Fetched: 1 row(s)
hive> desc hive_cc1;
OK
ctime double
carnum string
Time taken: 0.435 seconds, Fetched: 2 row(s)
hive> select count(*) from hive_cc1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180307113231_97c6c0df-e819-4b38-b044-949430f46331
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Job running in-process (local Hadoop)
2018-03-07 11:32:44,692 Stage-1 map = 0%, reduce = 0%
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2018-03-07 11:32:54,872 Stage-1 map = 100%, reduce = 0%
2018-03-07 11:33:02,035 Stage-1 map = 50%, reduce = 0%
2018-03-07 11:33:03,104 Stage-1 map = 100%, reduce = 0%
2018-03-07 11:33:04,115 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local827732673_0001
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 21959177 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
500000
Time taken: 32.857 seconds, Fetched: 1 row(s)
hive>
As we know, AWS big data consultant is the future of the industries these days, this article helps me to figure out which language I need to learn to pursue the future in this field.
ОтветитьУдалить