问题:我有2个数据框df1
和df2
。我的目标是通过替换df1
中的部分值来修改df2
。
import pandas as pd
# dataframe 1
data = {'A':[90,20,30,25,50,60],
'B':['qq','ee','rr','tt','ii','oo'],
'C':['XX','VV','BB','NN','KK','JJ']}
df1 = pd.DataFrame(data)
# dataframe 2
convert_table = {'X': ['dd','ee','ff','gg','hh','ii','ll','mm','nn','oo','pp','qq','rr','ss','tt','uu'],
'Y': ['DD','VV','FF','GG','HH','KK','LL','MM','NN','JJ','PP','XX','BB','SS','NN','LL'],
'Z': [5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61]}
df2 = pd.DataFrame(convert_table)
# search values of df1 inside of df2 and replace values
for idx1,row1 in df1.iterrows():
for idx2, row2 in df2.iterrows():
if row1['B']==row2['X'] and row1['C']==row2['Y']:
df1.replace(to_replace=row1['B'],value=row2['Z'],inplace=True)
正如您所看到的,我有2个for循环,并检查df1
(row1
的通用行是否在df2
内找到。如果满足此条件,则我将row1 ['B']中包含的值替换为row2['Z']
因此,我得到的结果(正是我希望得到的结果):
In [120]: df1
Out[120]:
A B C
0 90 43 XX
1 20 7 VV
2 30 47 BB
3 25 59 NN
4 50 19 KK
5 60 37 JJ
注意B列是如何变化的。
问题:您能否建议我更好地编写代码?我想尽可能快地使用Pandas或Python提供的内置函数。
注意:数据框中包含的数据仅用于演示目的。
答案 0 :(得分:3)
在两列上使用合并:
df1.merge(df2, left_on=['B','C'], right_on=['X','Y'], how='left')
how='left'
在这里至关重要。如果您不明白原因,请阅读Brief primer on merge methods (relational algebra)。
我将修改您的示例以创建一个df1中的条目,该条目在df2中不存在,即('ii','KK')
In [1]:
# dataframe 2
convert_table = {'X': ['dd','ee','ff','gg','hh','ll','mm','nn','oo','pp','qq','rr','ss','tt','uu'],
'Y': ['DD','VV','FF','GG','HH','LL','MM','NN','JJ','PP','XX','BB','SS','NN','LL'],
'Z': [5,7,11,13,17,19,23,29,37,41,43,47,53,59,61]}
df2 = pd.DataFrame(convert_table)
In [2]: merged = df1.merge(df2, left_on=['B','C'], right_on=['X','Y'], how='left')
merged
Out[2]:
A B C X Y Z
0 90 qq XX qq XX 43.0
1 20 ee VV ee VV 7.0
2 30 rr BB rr BB 47.0
3 25 tt NN tt NN 59.0
4 50 ii KK NaN NaN NaN
5 60 oo JJ oo JJ 37.0
现在检索最终的数据帧:
In [3]:
merged.ix[merged.Z.notnull(),'B'] = merged.ix[merged.Z.notnull(),'Z']
merged = merged[['A','B','C']]
merged
Out[3]:
A B C
0 90 43 XX
1 20 7 VV
2 30 47 BB
3 25 59 NN
4 50 ii KK
5 60 37 JJ