我在下面有两个数据框:
df1: df2:
+------------+------------+-----------+ +-----------+-------------+-----------+
| date |Advertiser |Impressions| | date |Advertiser |Impressions|
+------------+------------+-----------+ +-----------+-------------+-----------+
|2020-01-08 |b |50035 | | 2020-01-07|b |10000 |
|2020-01-08 |c |70000 | | 2020-01-07|c |25260 |
+------------+------------+-----------+ +-----------+-------------+-----------+
我想做 df1(Impressions) - df2(Impressions),并将其保存到一个新的数据帧 df3。
+------------+------------+----------------+
| date |Advertiser |diff Impressions|
+------------+------------+----------------+
|2020-01-08 |b |40035 |
|2020-01-08 |c |44740 |
+------------+------------+----------------+
答案 0 :(得分:0)
您可以使用广告商列连接两个数据框并进行适当的选择:
df3 = df1.join(df2, 'Advertiser').select(
df1.date,
'Advertiser',
(df1.Impressions - df2.Impressions).alias('diff Impressions')
)
df3.show()
+----------+----------+----------------+
| date|Advertiser|diff Impressions|
+----------+----------+----------------+
|2020-01-08| b| 40035|
|2020-01-08| c| 44740|
+----------+----------+----------------+