我想知道subtract
的工作原理
target_df = df.subtract(df1)
要么将df1
之外的行返回到target_df
,要么将df
之外的df1
行返回到target_df
< / p>
答案 0 :(得分:0)
让我们假设以下示例:
df1 has values as (1,2,3,4,5,6)
df2 has values as (3,4,5,6,7,8)
然后target_df = df1.subtract(df2)的值将为“ df1中的值-两个dfs中的通用值”,即
(1,2,3,4,5,6) - (3,4,5,6) = (1,2)
请按以下代码运行:
from pyspark.sql import Row
df1 = spark.sparkContext.parallelize([Row(1), Row(2), Row(3), Row(4), Row(5), Row(6)]).toDF()
df2 = spark.sparkContext.parallelize([Row(3), Row(4), Row(5), Row(6), Row(7), Row(8)]).toDF()
target_df = df1.subtract(df2)
target_df.show()