使用spark sql将json数据加载到hive表中

时间:2018-02-19 11:02:53

标签: apache-spark amazon-s3 hive pyspark hiveql

我正在尝试将数据帧加载到json数据中这是我的示例数据

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions.lit
val df = Seq((2012, 8, "Batman", 9.8), (2012, 8, "Hero", 8.7), (2012, 7, "Robot", 5.5), (2011, 7, "Git", 2.0)).toDF("year", "month", "title", "rating")

我正在将数据转换为json对象

import org.apache.spark.sql.functions._

val finalJsonDF = df.select(to_json(struct("year", "month", "title", "rating"))).as("test")

我可以查看数据,但数据结构是

structstojson(named_struct(NamePlaceholder(), year, NamePlaceholder(), month, NamePlaceholder(), title, NamePlaceholder(), rating))

现在我正在尝试创建一个表并将数据帧加载到表中

finalJsonDF.show()

finalJsonDF.write.json("s3://tmp/test")

spark.sql(""" drop table if exists test.sample_test""")

spark.sql("""create external table test.sample_test 
(
test struct<year:String,
month:String, 
title:String, 
rating:String>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as TextFile
location "s3://tmp/test"
""")

spark.sql(""" describe test.sample_test""").show()

spark.sql(""" select * from  test.sample_test""").show()

我只能看到空行。

0 个答案:

没有答案