Сообщения

Сообщения за апрель, 2018

Big Data example of calculation (Hive on HDFS, SparkSQL with scala on local NTFS)

We have a task with description. Thare is json file (events.json) with events: ... ... {"_t":1507087562,"_p":"sburke@test.com","_n":"app_loaded","device_type":"desktop"} {"_t":1505361824,"_p":"pfitza@test.com","_n":"added_to_team","account":"1234"} {"_t":1505696264,"_p":"keiji@test.com","_n":"registered","channel":"Google_Apps"} ... ... There are about 500000 lines in file (json objects). And only this 3 types of objects possible. _t - is a timestamp _p - email and we can use it as a unique identifier of user. _n - event type (app_loaded - application was loaded by user, registered - user has registered) and last one is additional attribute, device_type, account, channel Task: 1) Load data from json file by events, only app_loaded  and registered in 2 parquet

Load CSV into Hive table (example: Bangkok Districts) (Not finished article)

Изображение
In this article I begin big preparation to finish research work - Using Big Data by police to search serial criminal. Article 1. Dictionaries preparation. We need prepare and load into hive database some new dictionaries- like Districts, road web cams, cars government registration numbers and cars owners information. First of all I download wikipedia table -  List of districts of Bangkok  and save it with Excel as csv file with name bkk_dist_csv.csv Data looks like Download file into Hive table $ hadoop fs -mkdir /user/data/dics  Load with Hadoop WebUI and check  $ hadoop fs -ls /user/data/dics/  Found 1 items -rw-r--r-- 3 dr.who supergroup 1202 2018-04-03 12:01 /user/data/dics/bkk_dist_csv.csv And load it into Hive table drop table IF EXISTS d_src; CREATE EXTERNAL TABLE d_src( code string, eng_name string, thai_Name string ) COMMENT 'source external table for loading scv' ROW FORMAT DELIMITED FIEL

Hive partitioned EXTERNAL tables, create, load data, select, managing

First of all there is creation of database for studing external tables. Before, look on directory structure $ hadoop fs -ls /user/data Found 6 items -rw-r--r-- 2 hadoop supergroup 161 2018-03-02 14:31 /user/data/cc2.avsc drwxr-xr-x - hadoop supergroup 0 2018-04-03 12:27 /user/data/dics drwxr-xr-x - hadoop supergroup 0 2018-03-15 15:42 /user/data/js1.json drwxr-xr-x - hadoop supergroup 0 2018-03-16 10:52 /user/data/js_db drwxr-xr-x - hadoop supergroup 0 2018-03-02 13:49 /user/data/order.parquet -rw-r--r-- 2 dr.who supergroup 389 2018-02-28 16:10 /user/data/test_data.csv Create new one database with set location folder and setting some properties, like comment and creator. CREATE DATABASE IF NOT EXISTS ext_tabs COMMENT 'Database for studing external tables' LOCATION '/user/data/exttabs' WITH DBPROPERTIES ('creator' = 'Yakushev Aleksey', 'date' = '2018-04-05'); Now che