我有两个pandas数据帧。第一个是:
df1 = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"]})
第二个数据框是:
df2 = pd.DataFrame({"val1" : ["A1","A1","A1","B2","B2","B2"],
"val2" : [10, 13, 16, 11, 20, 22]})
我想将两者合并在一起,使用df1的行排序,df2的值遵循此顺序。理想情况下,我希望它看起来像这样:
df_final = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"],
"val2" : [11, 10, 20, 13, 22, 16]})
我尝试过使用left_on和right_on的merge函数,但是我没有得到我正在寻找的输出。任何帮助将不胜感激。
答案 0 :(得分:1)
你可以这样做:
df2
对['val1', 'val2']
中的值进行排序,按val1
对其进行分组并将其存储为g2
?idx
列添加到df1
,以便从df2
代码:
In [176]: df1['idx'] = 1
In [177]: df1['idx'] = df1.groupby('val1')['idx'].cumsum()-1
In [178]: df1
Out[178]:
val1 idx
0 B2 0
1 A1 0
2 B2 1
3 A1 1
4 B2 2
5 A1 2
In [179]: g2 = df2.sort_values(['val1', 'val2']).groupby('val1')
In [180]: g2.groups
Out[180]: {'A1': [0, 1, 2], 'B2': [3, 4, 5]}
In [181]: df2.iloc[g2.groups['A1'][1]]
Out[181]:
val1 A1
val2 13
Name: 1, dtype: object
In [182]: df1.apply(lambda x: df2.iloc[g2.groups[x['val1']][x['idx']]], axis=1)
Out[182]:
val1 val2
0 B2 11
1 A1 10
2 B2 20
3 A1 13
4 B2 22
5 A1 16
答案 1 :(得分:0)
您可以使用groupby/cumcount
为每个组中的每一行分配一个唯一编号:
df1['cumcount'] = df1.groupby('val1').cumcount()
# val1 cumcount
# 0 B2 0
# 1 A1 0
# 2 B2 1
# 3 A1 1
# 4 B2 2
# 5 A1 2
如果我们对df2
执行相同操作:
df2['cumcount'] = df2.groupby('val1').cumcount()
# val1 val2 cumcount
# 0 A1 10 0
# 1 A1 13 1
# 2 A1 16 2
# 3 B2 11 0
# 4 B2 20 1
# 5 B2 22 2
然后将df1
与df2
合并在公共列(val1
和cumcount
)上会产生所需的结果:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"]})
df2 = pd.DataFrame({"val1" : ["A1","A1","A1","B2","B2","B2"],
"val2" : [10, 13, 16, 11, 20, 22]})
df_final = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"],
"val2" : [11, 10, 20, 13, 22, 16]})
df1['cumcount'] = df1.groupby('val1').cumcount()
df2['cumcount'] = df2.groupby('val1').cumcount()
result = pd.merge(df1, df2, how='left')
result = result.drop('cumcount', axis=1)
print(result)
assert result.equals(df_final)
产量
val1 val2
0 B2 11
1 A1 10
2 B2 20
3 A1 13
4 B2 22
5 A1 16
请注意,与how='left'
合并会产生与第一个DataFrame df1
相同行数的结果,并保持与df1
相同的行顺序。