无法将spark json数据帧加载到hive表中

时间:2018-02-16 18:39:28

标签: scala apache-spark apache-spark-sql spark-dataframe

我想将数据帧转换为json对象并将其加载到json表中。

以下是代码

创建表格

spark.sql("""create table IF NOT EXISTS user_tech.tests (
Z struct<A:string, 
 B:string,
 C:string>
)
stored as orc """)

import org.apache.spark.sql._

初始数据框

val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")    


val jsonColumns = df.select("A", "B", "C")

将其转换为json

import org.apache.spark.sql.functions._
val finalDF = jsonColumns.select(to_json(struct(col("A"), col("B"), col("C")))).as("Z")

将行插入表格

finalDF.registerTempTable("test")

spark.sql(""" select * from test """).show()

spark.sql("""Insert into  user_tech.tests select * from test""")

我收到以下错误:

org.apache.spark.sql.AnalysisException: cannot resolve 'test.`structstojson(named_struct(NamePlaceholder(), A, NamePlaceholder(), B, NamePlaceholder(), C))`' due to data type mismatch: cannot cast StringType to StructType(StructField(guid,StringType,true), StructField(sessionid,StringType,true));;

1 个答案:

答案 0 :(得分:1)

问题在于以下陈述。

val finalDF = jsonColumns.select(to_json(struct(col("A"), col("B"), col("C")))).as("Z")

对上述DataFrame的快速验证将使您了解您正在创建一个String类型的列。

scala> finalDF.show
+--------------------------------------------------------------------------------------------+
|structtojson(named_struct(NamePlaceholder(), A, NamePlaceholder(), B, NamePlaceholder(), C))|
+--------------------------------------------------------------------------------------------+
|                                                                         {"A":1,"B":2,"C":3}|
|                                                                         {"A":2,"B":3,"C":4}|
+--------------------------------------------------------------------------------------------+


scala> finalDF.printSchema
root
 |-- structtojson(named_struct(NamePlaceholder(), A, NamePlaceholder(), B, NamePlaceholder(), C)): string (nullable = true)

当你尝试从在finalDF上注册的Temp表插入时,架构不匹配,你得到了例外。

以下应该适合你。

spark.sql("""create table IF NOT EXISTS tests (
Z struct<A:string, 
 B:string,
 C:string>
)
stored as orc """)


import org.apache.spark.sql._

val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")    

val jsonColumns = df.select("A", "B", "C")

jsonColumns.registerTempTable("tmp")

spark.sql("""Insert into tests select struct(*) from tmp""")

您可以使用以下语句查看数据。

spark.sql("select * from tests").show


+-------+
|      Z|
+-------+
|[1,2,3]|
|[1,2,3]|
|[2,3,4]|
|[2,3,4]|
+-------+

希望有所帮助!