Python Pandas-根据索引顺序合并两个数据帧

时间:2016-04-03 20:41:05

标签: python pandas

我有两个pandas数据帧。第一个是:

df1 = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"]})

第二个数据框是:

df2 = pd.DataFrame({"val1" : ["A1","A1","A1","B2","B2","B2"],
                    "val2" : [10, 13, 16, 11, 20, 22]})

我想将两者合并在一起,使用df1的行排序,df2的值遵循此顺序。理想情况下,我希望它看起来像这样:

df_final = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"],
                         "val2" : [11, 10, 20, 13, 22, 16]})

我尝试过使用left_on和right_on的merge函数,但是我没有得到我正在寻找的输出。任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

你可以这样做:

  1. df2['val1', 'val2']中的值进行排序,按val1对其进行分组并将其存储为g2
  2. idx列添加到df1,以便从df2
  3. 中选择值

    代码:

    In [176]: df1['idx'] = 1
    
    In [177]: df1['idx'] = df1.groupby('val1')['idx'].cumsum()-1
    
    In [178]: df1
    Out[178]:
      val1  idx
    0   B2    0
    1   A1    0
    2   B2    1
    3   A1    1
    4   B2    2
    5   A1    2
    
    In [179]: g2 = df2.sort_values(['val1', 'val2']).groupby('val1')
    
    In [180]: g2.groups
    Out[180]: {'A1': [0, 1, 2], 'B2': [3, 4, 5]}
    
    In [181]: df2.iloc[g2.groups['A1'][1]]
    Out[181]:
    val1    A1
    val2    13
    Name: 1, dtype: object
    
    In [182]: df1.apply(lambda x: df2.iloc[g2.groups[x['val1']][x['idx']]], axis=1)
    Out[182]:
      val1  val2
    0   B2    11
    1   A1    10
    2   B2    20
    3   A1    13
    4   B2    22
    5   A1    16
    

答案 1 :(得分:0)

您可以使用groupby/cumcount为每个组中的每一行分配一个唯一编号:

df1['cumcount'] = df1.groupby('val1').cumcount()
#   val1  cumcount
# 0   B2         0
# 1   A1         0
# 2   B2         1
# 3   A1         1
# 4   B2         2
# 5   A1         2

如果我们对df2执行相同操作:

df2['cumcount'] = df2.groupby('val1').cumcount()
#   val1  val2  cumcount
# 0   A1    10         0
# 1   A1    13         1
# 2   A1    16         2
# 3   B2    11         0
# 4   B2    20         1
# 5   B2    22         2

然后将df1df2合并在公共列(val1cumcount)上会产生所需的结果:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"]})
df2 = pd.DataFrame({"val1" : ["A1","A1","A1","B2","B2","B2"],
                    "val2" : [10, 13, 16, 11, 20, 22]})
df_final = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"],
                         "val2" : [11, 10, 20, 13, 22, 16]})

df1['cumcount'] = df1.groupby('val1').cumcount()
df2['cumcount'] = df2.groupby('val1').cumcount()
result = pd.merge(df1, df2, how='left')
result = result.drop('cumcount', axis=1)
print(result)
assert result.equals(df_final)

产量

  val1  val2
0   B2    11
1   A1    10
2   B2    20
3   A1    13
4   B2    22
5   A1    16

请注意,与how='left'合并会产生与第一个DataFrame df1相同行数的结果,并保持与df1相同的行顺序。