Question

我在pandas中有以下两个数据帧：

DF1:
AuthorID1  AuthorID2  Co-Authored
A1         A2         0
A1         A3         0
A1         A4         0
A2         A3         0

DF2:
AuthorID1  AuthorID2  Co-Authored
A1         A2         5
A2         A3         6
A6         A7         9

我希望（没有循环和比较）在DF1中找到DF2中匹配的AuthorID1和AuthorID2配对，并相应地更新列值。因此上述两个表的结果如下：

Resulting Updated DF1:
AuthorID1  AuthorID2  Co-Authored
A1         A2         5
A1         A3         0
A1         A4         0
A2         A3         6

有快速的方法吗？因为我在DF1中有7百万行，循环和比较只需要永远。

更新：请注意DF2中的最后两个不应该是DF1中更新的一部分，因为它在DF1中不存在

Answer 1

您可以使用update：

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2          5.0
1        A2        A3          6.0
2        A1        A4          0.0
3        A2        A3          0.0

样品：

df1 = pd.DataFrame({'new': {0: 7, 1: 8, 2: 1, 3: 3}, 
                    'AuthorID2': {0: 'A2', 1: 'A3', 2: 'A4', 3: 'A3'}, 
                    'AuthorID1': {0: 'A1', 1: 'A1', 2: 'A1', 3: 'A2'}, 
                    'Co-Authored': {0: 0, 1: 0, 2: 0, 3: 0}})

df2 = pd.DataFrame({'AuthorID2': {0: 'A2', 1: 'A3'},
                    'AuthorID1': {0: 'A1', 1: 'A2'}, 
                    'Co-Authored': {0: 5, 1: 6}})

  AuthorID1 AuthorID2  Co-Authored  new
0        A1        A2            0    7
1        A1        A3            0    8
2        A1        A4            0    1
3        A2        A3            0    3

print (df2)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2            5
1        A2        A3            6

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored  new
0        A1        A2          5.0    7
1        A2        A3          6.0    8
2        A1        A4          0.0    1
3        A2        A3          0.0    3

通过评论编辑：

我认为您需要df2首先使用isin过滤df1：

df2 = df2[df2[['AuthorID1','AuthorID2']].isin(df1[['AuthorID1','AuthorID2']]).any(1)]
print (df2)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2            5
1        A2        A3            6

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2          5.0
1        A2        A3          6.0
2        A1        A4          0.0
3        A2        A3          0.0

Answer 2

您可以使用以下参数：

filter_func : callable(1d-array) -> 1d-array<boolean>, default None

可以选择替换NA以外的值。对于应更新的值，返回True

如何在Pandas Python中更新数据帧

2 个答案: