Question

我有2个dfs，看起来像这样

DF1：

ID  year  notes  score
12  2015  text   15.1
54  2014  text   18.4

DF2：

id_num  year  score
12      2015  15.1
12      2014  12.9
54      2014  18.4

我正在尝试使用df1中的所有数据创建一个新的df，并且只创建df中的得分列，其中df1.year = df2.year + 1。像这样：

ID  year   notes  score  prior_yr_score
12  2015   text   15.1   12.9

我正在阅读pandas文档，但我没有找到办法进行这种类型的条件加入。在sql我可以做

select a.*, b.score as prior_yr_score
from df1 as a left join df2 as b
on a.ID=b.id_num and a.year = b.year+1

然而在python我被困在

merged=pd.merge(df1, df2, how='left',left_on='ID',right_on='id_num')

如何在一个声明（pd.merge或其他）中执行此操作？

编辑：我已经阅读了一些关于python中sql样式连接的其他帖子和文档，但未能找到明确的答案。例如，this post看起来很相似，但在答案中，似乎OP实际上是试图按条件计算聚合度量，而不是用条件连接2个dfs。

Answer 1

In [92]: d1.merge(d2.assign(year=d2.year+1, prior_yr_score=d2.score).drop('score',1), left_on=['ID','year'], right_on=['id_num','year'])
Out[92]:
   ID  year notes  score  id_num  prior_yr_score
0  12  2015  text   15.1      12            12.9

Answer 2

您可以在df2中添加一列进行年+ 1计算，然后在该新列上合并吗？

pandas

2 个答案: