Big Data example of calculation (Hive on HDFS, SparkSQL with scala on local NTFS)
We have a task with description. Thare is json file (events.json) with events: ... ... {"_t":1507087562,"_p":"sburke@test.com","_n":"app_loaded","device_type":"desktop"} {"_t":1505361824,"_p":"pfitza@test.com","_n":"added_to_team","account":"1234"} {"_t":1505696264,"_p":"keiji@test.com","_n":"registered","channel":"Google_Apps"} ... ... There are about 500000 lines in file (json objects). And only this 3 types of objects possible. _t - is a timestamp _p - email and we can use it as a unique identifier of user. _n - event type (app_loaded - application was loaded by user, registered - user has registered) and last one is additional attribute, device_type, account, channel Task: 1) Load data from json file by events, only app_loaded and registered in 2 parquet ...