Сообщения

Сообщения за ноябрь, 2018

Spark SQL Window with UDAF function

Изображение
This short topic show full example of using Spark Windows with  UDAF 1) I have Cassandra cluster with column family that will be used as our source data 2) I want to calculate average "disp" by the window from current rows plus two next rows.    SparkWindow 3) Scala code to load source Dataset from Cassandra, I exclude some columns, take just part of data (ticker=1 and width=30) and sort it ascending by ts_begin (this is a unix_timestamp from the external system) def getBarsFromCass(TickerID :Int, BarWidthSec :Int) = { import org.apache.spark.sql.functions._ spark.read.format("org.apache.spark.sql.cassandra") .options(Map("table" -> "bars", "keyspace" -> "mts_bars")) .load() .where(col("ticker_id") === TickerID && col("bar_width_sec") === BarWidthSec ) .select(col("ts_begin"), col("btype"), col("disp&qu