Question

我正在使用以下代码将数据帧数据直接插入到databricks增量表中：

eventDataFrame.write.format("delta").mode("append").option("inferSchema","true").insertInto("some delta table"))

但是，如果创建detla表的列顺序与数据框的列顺序不同，则值会变得混乱，然后就不会写入正确的列。如何维持订单？是否有标准的方法/最佳实践来做到这一点？

Answer 1

这很简单-

`

####in pyspark 

df= spark.read.table("TARGET_TABLE")  ### table in which  we need to insert finally 

df_increment ## the data frame which has random column order which we want to insert into TARGET_TABLE
df_increment =df_increment.select(df.columns)
df_increment.write.insertInto("TARGET_TABLE")

`

所以对你来说

parent_df=   spark.read.table("some delta table") 
eventDataFrame.select(parent_df.columns).write.format("delta").mode("append").option("inferSchema","true").insertInto("some delta table"))

Answer 2

使用 saveAsTable 列顺序无关紧要，spark 会根据列名找到正确的列位置。

eventDataFrame.write.format("delta").mode("append").option("inferSchema","true").saveAsTable("foo")

来自 spark 文档。

<块引用>

DataFrame 模式中的列顺序不需要与现有表的列顺序相同。与 insertInto 不同，saveAsTable 将使用列名来查找正确的列位置

如何在执行spark dataframe.write（）。insertInto（“ table”）时确保正确的列顺序？

2 个答案: