根据熊猫中两列的组合比较两个数据框

时间:2020-01-11 16:11:01

标签: pandas pandas-groupby

我必须对数据帧进行如下显示。

其中df2是df1的更新版本。

df1:

Sector      Plot         Status
SE1         1            UnderConstruction
SE1         2            Constructed
SE1         3            UnderConstruction
SE2         1            Constructed
SE2         2            Constructed
SE2         3            Developed

df2:

Sector      Plot         Status
SE1         1            Constructed
SE1         2            Constructed
SE1         3            Developed
SE2         1            Constructed
SE2         2            Developed
SE2         3            Developed
SE3         1            Developed

从上面开始,我想比较上面两个表并创建一个新表,如下图所示。

Sector      Plot         NewStatus         PreviousStatus
SE1         1            Constructed       UnderConstruction
SE1         3            Developed         UnderConstruction
SE2         2            Developed         Constructed

1 个答案:

答案 0 :(得分:1)

在前两列中将merge与外部联接一起使用,然后在DataFrame.dropna中过滤出相同的行以及具有错误值的行:

df = df2.merge(df1, how='outer', on=['Sector','Plot'], suffixes=('_new','_prev'))
df = df[df['Status_new'].ne(df['Status_prev'])].dropna(subset=['Status_new','Status_prev'])
print (df)
  Sector  Plot   Status_new        Status_prev
0    SE1     1  Constructed  UnderConstruction
2    SE1     3    Developed  UnderConstruction
4    SE2     2    Developed        Constructed

您可以创建如下功能:

def compare(df1, df2, on, comp):
    df = df2.merge(df1, how='outer', on=on, suffixes=('_new','_prev'))
    return (df[df[f'{comp}_new'].ne(df[f'{comp}_prev'])]
                .dropna(subset=[f'{comp}_new',f'{comp}_prev']))


df = compare(df1, df2, ['Sector','Plot'], 'Status')