Сообщения

Сообщения за январь, 2019

Read Write Map fields from Cassandra with Scala

Изображение
In this little example, I show how to read and write Map columns in Cassandra. Cassandra table, here we have columns: ticker_id - ID of financial tool like EURUSD, ORCL, APPL, GOLD. ddate       - simple date of DAY. ('2019-01-25','2019-01-26',...) CREATE TABLE mts_bars.td_bars_3600( ticker_id int, ddate date, bar_1 map , bar_2 map , bar_3 map , bar_4 map , bar_5 map , bar_6 map , bar_7 map , bar_8 map , bar_9 map , bar_10 map , bar_11 map , bar_12 map , bar_13 map , bar_14 map , bar_15 map , bar_16 map , bar_17 map , bar_18 map , bar_19 map , bar_20 map , bar_21 map , bar_22 map , bar_23 map , bar_24 map , PRIMARY KEY ((ticker_id), ddate) ) WITH CLUSTERING ORDER BY (ddate DESC); Primary key contains 2 columns, ticker_id and ddate ticker_id - is a PARTITION KEY, is responsible for data distribution across your nodes. ddate      - is a Clustering Key, is responsible for...

Load data from Cassandra to HDFS parquet files and select with Hive

Изображение
HDFS directory preparation on Hadoop cluster. From LXC with name hdpnn execute next: Hadoop and Cassandra cluster installation you can find  in this article. My idea is writing an application with Scala which will be run on Spark cluster for load data from Cassandra into HDFS parquet files, for future analyzes with Hive. First I do it manually, step by step. 1) HDFS directory preparation [hadoop@hdpnn ~]$ hadoop fs -mkdir /user/tickers [hadoop@hdpnn ~]$ hadoop fs -rm -r /user/tickers/ticker_23 [hadoop@hdpnn ~]$ hdfs dfs -chown root:root /user/tickers [hadoop@hdpnn ~]$ hadoop fs -ls /user drwxr-xr-x - root root 0 2019-01-10 06:27 /user/tickers 2) Load data with Spark using spark-shell (on smn) spark-shell --driver-memory 1G --executor-memory 1G --driver-cores 1 --executor-cores 1 --jars "/opt/spark-2.3.2/jars/spark-cassandra-connector-assembly-2.3.2.jar" --conf spark.cassandra.connection.host= 192.168.122.192 --conf "spark.sql.parquet.writeL...