Question

我有两个数据帧A和B，它们具有A和B的通用索引。这些通用索引对于A和B可能出现几次（重复）。

我想通过以下三种方式合并A和B：

情况0：如果i的索引A出现一次(i1)，索引i出现B 出现(i1)次，我希望按索引合并数据框来添加 A(i1), B(i1)行
案例1：如果i的索引A出现一次(i1)，索引i出现B 依次出现两次：(i1 and i2)，我希望合并索引数据帧以添加行A(i1), B(i1) and A(i1), B(i2)。
情况2：如果i的索引A以此顺序出现两次：(i1, i2)和 i的索引B以此顺序出现两次：(i1 and i2)，我想要我的按索引合并的数据帧添加了行A(i1), B(i1) and A(i2), B(i2)。

这3种情况都是我的数据中可能出现的所有可能情况。

使用pandas.merge时，情况0和情况1有效。但是对于情况2，返回的数据帧将添加行A(i1), B(i1) and A(i1), B(i2) and A(i2), B(i1) and A(i2), B(i2)而不是A(i1), B(i1) and A(i2), B(i2)。

我可以使用pandas.merge方法，然后删除不需要的合并行，但是有没有办法同时合并这3种情况？

A = pd.DataFrame([[1, 2], [4, 2], [5,5], [5,5], [1,1]], index=['a', 'a', 'b', 'c', 'c'])
B = pd.DataFrame([[1, 5], [4, 8], [7,7], [5,5]], index=['b', 'c', 'a', 'a'])
pd.merge(A,B, left_index=True, right_index=True, how='inner')

例如，在上面的数据框中，我希望没有第二和第三个索引'a'。

Answer 1

基本上，您的3个案例可以总结为2个案例：

索引i在A和B中出现相同的次数（1或2次），并按照顺序合并。
索引i在A中发生2次，在B中发生1次，并使用B内容合并所有行。

预设代码：

def add_secondary_index(df):
    df.index.name = 'Old'
    df['Order'] = df.groupby(df.index).cumcount()
    df.set_index('Order', append=True, inplace=True)
    return df
import pandas as pd
A = pd.DataFrame([[1, 2], [4, 2], [5,5], [5,5], [1,1]], index=['a', 'a', 'b', 'c', 'c'])
B = pd.DataFrame([[1, 5], [4, 8], [7,7], [5,5]], index=['b', 'c', 'a', 'a'])
index_times = A.groupby(A.index).count() == B.groupby(B.index).count()

由于情况1很容易解决，您只需添加二级索引：

same_times_index = index_times[index_times[0].values].index
A_same = A.loc[same_times_index].copy()
B_same = B.loc[same_times_index].copy()
add_secondary_index(A_same)
add_secondary_index(B_same)
result_merge_same = pd.merge(A_same,B_same,left_index=True,right_index=True)

对于情况2，您需要单独考虑：

not_same_times_index = index_times[~index_times.index.isin(same_times_index)].index
A_notsame = A.loc[not_same_times_index].copy()
B_notsame = B.loc[not_same_times_index].copy()
result_merge_notsame = pd.merge(A_notsame,B_notsame,left_index=True,right_index=True)

您可以考虑是为result_merge_notsame添加二级索引，还是为result_merge_same删除二级索引。

按索引合并（或合并）两个数据帧，并使用重复的索引

1 个答案: