如何将流数据集写入Hive?

时间:2018-01-15 11:10:46

标签: apache-spark hive apache-spark-sql spark-structured-streaming

使用Apache Spark 2.2:Structured Streaming,我正在创建一个从Kafka读取数据并将其写入Hive的程序。 我正在寻找写入Kafka主题@ 100记录/秒的大量数据。

创建的Hive表:

CREATE TABLE demo_user( timeaa BIGINT, numberbb INT, decimalcc DOUBLE, stringdd STRING, booleanee BOOLEAN ) STORED AS ORC ;

通过手动Hive查询插入:

INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true);

通过spark结构化流媒体代码插入:

SparkConf conf = new SparkConf();
conf.setAppName("testing");
conf.setMaster("local[2]");
conf.set("hive.metastore.uris", "thrift://localhost:9083");
SparkSession session = 
SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();

// workaround START: code to insert static data into hive
String insertQuery = "INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true)";
session.sql(insertQuery);
// workaround END:

// Solution START
Dataset<Row> dataset = readFromKafka(sparkSession); // private method reading data from Kafka's 'xyz' topic

// **My question here:**
// some code which writes dataset into hive table demo_user
// Solution END

1 个答案:

答案 0 :(得分:-1)

使用以下内容时无需创建配置单元表,这是自动创建的

dataset.write.jdbc(String url,String table,java.util.Properties connectionProperties)

或使用

dataset.write.saveAsTable(String tableName)