Сообщения

Сообщения за май, 2018

Loading data into Spark from Oracle RDBMS, CSV

In this article shown how read data from Oracle tables with jdbc and direct from csv with Spark. First of all we need download jdbc Oracle driver  ojdbc6.jar and put it into Spark Home jar directory. For example - C:\spark-2.3.0-bin-hadoop2.7\jars Example #1. From Oracle partitioned table load data per pertitions into parquet file. aaa.xxx.yyyy.zzz - ip address of Oracle server. import java.io.File import org.apache.spark.sql.{Row, SaveMode, SparkSession} val warehouseLocation = new File("spark-warehouse").getAbsolutePath val spark = SparkSession.builder().appName("Spark Hive Example").config("spark.sql.warehouse.dir",warehouseLocation).config("hive.exec.dynamic.partition","true").config("hive.exec.dynamic.partition.mode","nonstrict").enableHiveSupport().getOrCreate() //recreate table spark.sql(s"DROP TABLE IF EXISTS T_DATA") spark.sql(s"CREATE TABLE IF NOT EXISTS T_DATA(id_row STRING,