Сообщения

Сообщения за март, 2018

JSON in Hive 2.3.2 With Hive-JSON-Serde (Load data,Query tables, complex structure)

Изображение
previous posts:  hadoop cluster  and  hive installation In this topic we will go through some steps: create local json, load it into hdfs, creation external hive table, queries to this table and etc. First of all create local json file with notepad or vi, name it js1.json and populate with this data: {"ts":1520318907,"device":1,"metric":"p","value":100} {"ts":1520318908,"device":2,"metric":"p","value":110} {"ts":1520318909,"device":1,"metric":"v","value":8} {"ts":1520318910,"device":2,"metric":"v","value":9} {"ts":1520318911,"device":1,"metric":"p","value":120} {"ts":1520318912,"device":2,"metric":"p","value":140} {"ts":1520318913,"device":1,"metric":&q

Compare file formats supported by Hive, some examples, change replication factor

Previous posts about  hadoop 3.0 cluster  and  hive install  used in this article. Used hive version: [hadoop@hadoop-master myhql]$ hive --version Hive 2.3.2 You can see  here about supported file formats (Storage Formats) I gonna just try next 4: SEQUENCEFILE - Stored as compressed Sequence File. ORC - Stored as ORC file format. Supports ACID Transactions & Cost-based Optimizer (CBO). Stores column-level metadata.(ORC - Optimized Row Columnar) PARQUET - Stored as Parquet format for the Parquet columnar storage format. AVRO - Stored as Avro format. Simple script to create and populate table with test data: [hadoop@hadoop-master myhql]$ hostname hadoop-master [hadoop@hadoop-master myhql]$ pwd /opt/hive/myhql [hadoop@hadoop-master myhql]$ ls pop_tables.hql [hadoop@hadoop-master myhql]$ vi pop_tables.hql drop table tmp_seq_part_date; drop table tmp_orc_part_date; drop table tmp_prq_part_date; drop table tmp_avro_part_d

Install Hive 2.3.2 on Hadoop (3.0.0) NameNode. Hive metastore on external postgres database.

Изображение
previous post ( Install and configure Hadoop 3 cluster ) This cluster NameNode is using in next articles. 1) Download and extract hive binary  cd /opt wget http://apache-mirror.rbc.ru/pub/apache/hive/hive-2.3.2/apache-hive-2.3.2-bin.tar.gz tar -xvf apache-hive-2.3.2-bin.tar.gz mv apache-hive-2.3.2-bin hive chown -R hadoop /opt/hive Configure environment in (hadoop user) home directory edit file .bashrc hadoop# cd ~ vi .bashrc now it looks like: # .bashrc export HADOOP_HOME=/opt/hadoop export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HIVE_HOME=/opt/hive export PATH=$PATH:$HIVE_HOME/bin ... ... We have added HIVE_HOME and modify PATH. Don't forget execute source .bashrc after exit from edit .bashrc