Install Hive 2.3.2 on Hadoop (3.0.0) NameNode. Hive metastore on external postgres database.

previous post (Install and configure Hadoop 3 cluster) This cluster NameNode is using in next articles.

1) Download and extract hive binary 

cd /opt
wget http://apache-mirror.rbc.ru/pub/apache/hive/hive-2.3.2/apache-hive-2.3.2-bin.tar.gz
tar -xvf apache-hive-2.3.2-bin.tar.gz
mv apache-hive-2.3.2-bin hive
chown -R hadoop /opt/hive


Configure environment in (hadoop user) home directory edit file .bashrc


hadoop# cd ~
vi .bashrc

now it looks like:

# .bashrc

export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HIVE_HOME=/opt/hive
export PATH=$PATH:$HIVE_HOME/bin

...
...


We have added HIVE_HOME and modify PATH.

Don't forget execute

source .bashrc 

after exit from edit .bashrc

Download and add library


wget http://central.maven.org/maven2/org/apache/parquet/parquet-hive-bundle/1.9.0/parquet-hive-bundle-1.9.0.jar
cp parquet-hive-bundle-1.9.0.jar /opt/hive/lib



Now we need configure hive metastore database:

The Hive metastore service stores the metadata for Hive tables and partitions in a relational database, and provides clients (including Hive) access to this information using the metastore service API.


I will use "Local Mode", where Hive metastore service runs in the same process as the main HiveServer process, but the metastore database runs in a separate host (10.242.5.62). The embedded metastore service communicates with the metastore database over JDBC (You can find this library in /opt/hive/lib/).

[hadoop@hadoop-master conf]$ pwd
/opt/hive/conf
[hadoop@hadoop-master conf]$ ls /opt/hive/lib | grep postgr
postgresql-9.4.1208.jre7.jar
postgresql-metadata-storage-0.9.2.jar


Now we configure Hive,

[hadoop@hadoop-master conf]$ pwd
/opt/hive/conf
[hadoop@hadoop-master conf]$ vi hive-site.xml
and make next changes:

configuration  

property  
    name  javax.jdo.option.ConnectionURL 
    value  jdbc:postgresql://10.242.5.62:5432/hive_meta?createDatabaseIfNotExist=true  
    description  JDBC connection string used by Hive Metastore  
/property  
  
property  
    name  javax.jdo.option.ConnectionDriverName 
    value  org.postgresql.Driver  
    description  JDBC Driver class  
/property 
  
property  
      name  javax.jdo.option.ConnectionUserName 
      value  hive  
      description  Metastore database user name  
/property  
  
property  
      name  javax.jdo.option.ConnectionPassword 
      value  hive  
      description  Metastore database password  
/property  
  
property  
      name  hive.metastore.uris 
      value  thrift://10.242.5.88:9084  
      description  Thrift server hostname and port  
/property 
  
property  
  name  hive.server2.use.SSL 
  value  false  
/property 

property  
  name  hive.server2.authentication 
  value  NONE  
/property  

...
...


Go to you prefer IDE for RDBMS connect to server 10.242.5.62:5432 and create new database hive_meta. Create role hive with password hive and add priveleges to database hive_meta. Go back and execute


# schematool -dbType postgres -initSchema


You will see that it creates tables in database hive_meta




Now we can start metastore service like:


[hadoop@hadoop-master root]$ whoami
hadoop
[hadoop@hadoop-master root]$ hive --service metastore
2018-03-07 10:51:48: Starting Hive Metastore Server
...
/************************************************************
STARTUP_MSG: Starting HiveMetaStore
STARTUP_MSG:   host = hadoop-master/10.242.5.88
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.3.2
STARTUP_MSG:   classpath = /opt/hive/conf:/opt/hive/lib.......
...
STARTUP_MSG:   build = git://stakiar-MBP.local/Users/stakiar/Desktop/scratch-space/apache-hive -r 857a9fd8ad725a53bd95c1b2d6612f9b1155f44d; compiled by 'stakiar' on Thu Nov 9 09:11:39 PST 2017
************************************************************/
2018-03-07T10:51:57,659 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Starting hive metastore on port 9084
2018-03-07T10:51:57,837 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2018-03-07T10:52:09,935 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Added admin role in metastore
2018-03-07T10:52:09,970 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Added public role in metastore
2018-03-07T10:52:10,079 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - No user is added in admin role, since config is empty
2018-03-07T10:52:10,978 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Starting DB backed MetaStore Server with SetUGI enabled
2018-03-07T10:52:11,026 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Started the new metaserver on port [9084]...
2018-03-07T10:52:11,026 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Options.minWorkerThreads = 200
2018-03-07T10:52:11,026 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Options.maxWorkerThreads = 1000
2018-03-07T10:52:11,026 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStore - TCP keepalive = true



Next open new ssh console on Hadoop NameNode 10.242.5.88 (Where we run Hive) And start hiveserver2


[hadoop@hadoop-master root]$ whoami
hadoop
[hadoop@hadoop-master root]$ hive --service hiveserver2
which: no hbase in (/sbin:/bin:/usr/sbin:/usr/bin:/opt/hadoop/sbin:/opt/hadoop/bin:/opt/hive/bin:/opt/maven/bin)
2018-03-07 10:56:16: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]



Ok, next we can connect to Hive "database" with DBeaver or any IDE. I will use DBeaver.



If you connect success you can execute SQL queries in Hive. Examples on image.
While query is executing in DBeaver you can see executing stages in console where you run  hiveserver2 (right part on image)


Also possible to see execution plan.


Addition there are Hive web panel and CLI (command line interface) interface:
http://10.242.5.88:10002







CLI:


[hadoop@hadoop-master conf]$ whoami
hadoop
[hadoop@hadoop-master conf]$ hive
which: no hbase in (/sbin:/bin:/usr/sbin:/usr/bin:/opt/hadoop/sbin:/opt/hadoop/bin:/opt/hive/bin:/opt/maven/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.cla                  ss]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e.                   spark, tez) or using Hive 1.X releases.
hive> show tables;

OK
hive_cc1
Time taken: 3.544 seconds, Fetched: 1 row(s)
hive> desc hive_cc1;

OK

ctime                   double
carnum                  string

Time taken: 0.435 seconds, Fetched: 2 row(s)
hive> select count(*) from hive_cc1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180307113231_97c6c0df-e819-4b38-b044-949430f46331
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Job running in-process (local Hadoop)
2018-03-07 11:32:44,692 Stage-1 map = 0%,  reduce = 0%
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2018-03-07 11:32:54,872 Stage-1 map = 100%,  reduce = 0%
2018-03-07 11:33:02,035 Stage-1 map = 50%,  reduce = 0%
2018-03-07 11:33:03,104 Stage-1 map = 100%,  reduce = 0%
2018-03-07 11:33:04,115 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local827732673_0001
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 21959177 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
500000
Time taken: 32.857 seconds, Fetched: 1 row(s)
hive>


Комментарии

  1. As we know, AWS big data consultant is the future of the industries these days, this article helps me to figure out which language I need to learn to pursue the future in this field.

    ОтветитьУдалить

Отправить комментарий

Популярные сообщения из этого блога

Loading data into Spark from Oracle RDBMS, CSV

Load data from Cassandra to HDFS parquet files and select with Hive

Hadoop 3.0 cluster - installation, configuration, tests on Cent OS 7