Load CSV into Hive table (example: Bangkok Districts) (Not finished article)
In this article I begin big preparation to finish research work - Using Big Data by police to search serial criminal.
Article 1. Dictionaries preparation.
We need prepare and load into hive database some new dictionaries- like Districts, road web cams, cars government registration numbers and cars owners information.
First of all I download wikipedia table - List of districts of Bangkok and save it with Excel as csv file with name bkk_dist_csv.csv
Data looks like
Download file into Hive table
$ hadoop fs -mkdir /user/data/dics
Load with Hadoop WebUI and check
$ hadoop fs -ls /user/data/dics/
Found 1 items
-rw-r--r-- 3 dr.who supergroup 1202 2018-04-03 12:01 /user/data/dics/bkk_dist_csv.csv
And load it into Hive table
drop table IF EXISTS d_src; CREATE EXTERNAL TABLE d_src( code string, eng_name string, thai_Name string ) COMMENT 'source external table for loading scv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' STORED AS TEXTFILE location '/user/data/dics/bkk_dist_csv.csv'; select * from d_src d_src.code |d_src.eng_name |d_src.thai_name | -----------|---------------|----------------| -- load csv with hadoop web interface or with copyFromLocal/put create table d_bkk_dist( code int, eng_name string, thai_Name string ) COMMENT 'Dictionary - List of districts of Bangkok' STORED AS ORC; INSERT OVERWRITE TABLE d_bkk_dist select cast(replace(d.code,' ','') as int) as code, replace(d.eng_name,' ','') as eng_name, replace(d.thai_name,' ','') as thai_name from d_src d where locate('Code', d.code)=0 drop table IF EXISTS d_src;
Now we need create WebCams and set their properties. Picture and simple description.
Look on this zoomed image.
There are road web cams, each cam has a unique ID, parent district ID.
For test reasons we need addition information, where we can go from the camera (for generate more real test data).
We use equal time for way between different cameras, in real data of course it's important.
CAM_ID DESTIN_CAM_ID
1 2
1 3
1 4
1 5
1 6
I manually print full map, manually write cameras ID, and populate reference table as csv file. The purpose of this all - car that registered in WebCamera (1) next time can be registered only in 2,3,4,5,6. (Fe: not in 1- 8, 3 - 12).
Комментарии
Отправить комментарий