我想将40个以上的deltaTables合并到单个源中。 我已经将单个deltaTable合并到源代码,下面给出了代码。
from delta.tables import DeltaTable
from pyspark.sql.functions import col,lit
merging_column = "HASH_ID"
target_dir="targetPath"
source_dir ="sourcePath"
sourceTable = spark.read.format("delta").load(source_dir)
targetTable = DeltaTable.forPath(spark, target_dir)
source_columns = sourceTable.columns
target_columns = targetTable.toDF().columns
columns_to_insert={}
columns_to_update={}
for column in target_columns:
print("column :" + column)
if column in source_columns:
columns_to_insert[column.upper()] = col("source." + column)
else:
columns_to_insert[column.upper()] = lit("")
columns_to_update = columns_to_insert.copy()
del columns_to_update[merging_column]
print("Merge started")
targetTable.alias("target")\
.merge(
sourceTable.alias("source"),
"target."+ merging_column +"= source."+ merging_column)\
.whenMatchedUpdate(set = columns_to_update) \
.whenNotMatchedInsert(values = columns_to_insert) \
.execute()
对于单一来源它工作正常,但是我有多个来源将它们合并。