我想将数据帧转换为json对象并将其加载到json表中。
以下是代码
创建表格
spark.sql("""create table IF NOT EXISTS user_tech.tests (
Z struct<A:string,
B:string,
C:string>
)
stored as orc """)
import org.apache.spark.sql._
初始数据框
val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")
val jsonColumns = df.select("A", "B", "C")
将其转换为json
import org.apache.spark.sql.functions._
val finalDF = jsonColumns.select(to_json(struct(col("A"), col("B"), col("C")))).as("Z")
将行插入表格
finalDF.registerTempTable("test")
spark.sql(""" select * from test """).show()
spark.sql("""Insert into user_tech.tests select * from test""")
我收到以下错误:
org.apache.spark.sql.AnalysisException: cannot resolve 'test.`structstojson(named_struct(NamePlaceholder(), A, NamePlaceholder(), B, NamePlaceholder(), C))`' due to data type mismatch: cannot cast StringType to StructType(StructField(guid,StringType,true), StructField(sessionid,StringType,true));;
答案 0 :(得分:1)
问题在于以下陈述。
val finalDF = jsonColumns.select(to_json(struct(col("A"), col("B"), col("C")))).as("Z")
对上述DataFrame的快速验证将使您了解您正在创建一个String类型的列。
scala> finalDF.show
+--------------------------------------------------------------------------------------------+
|structtojson(named_struct(NamePlaceholder(), A, NamePlaceholder(), B, NamePlaceholder(), C))|
+--------------------------------------------------------------------------------------------+
| {"A":1,"B":2,"C":3}|
| {"A":2,"B":3,"C":4}|
+--------------------------------------------------------------------------------------------+
scala> finalDF.printSchema
root
|-- structtojson(named_struct(NamePlaceholder(), A, NamePlaceholder(), B, NamePlaceholder(), C)): string (nullable = true)
当你尝试从在finalDF上注册的Temp表插入时,架构不匹配,你得到了例外。
以下应该适合你。
spark.sql("""create table IF NOT EXISTS tests (
Z struct<A:string,
B:string,
C:string>
)
stored as orc """)
import org.apache.spark.sql._
val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")
val jsonColumns = df.select("A", "B", "C")
jsonColumns.registerTempTable("tmp")
spark.sql("""Insert into tests select struct(*) from tmp""")
您可以使用以下语句查看数据。
spark.sql("select * from tests").show
+-------+
| Z|
+-------+
|[1,2,3]|
|[1,2,3]|
|[2,3,4]|
|[2,3,4]|
+-------+
希望有所帮助!