我必须对数据帧进行如下显示。
其中df2是df1的更新版本。
df1:
Sector Plot Status
SE1 1 UnderConstruction
SE1 2 Constructed
SE1 3 UnderConstruction
SE2 1 Constructed
SE2 2 Constructed
SE2 3 Developed
df2:
Sector Plot Status
SE1 1 Constructed
SE1 2 Constructed
SE1 3 Developed
SE2 1 Constructed
SE2 2 Developed
SE2 3 Developed
SE3 1 Developed
从上面开始,我想比较上面两个表并创建一个新表,如下图所示。
Sector Plot NewStatus PreviousStatus
SE1 1 Constructed UnderConstruction
SE1 3 Developed UnderConstruction
SE2 2 Developed Constructed
答案 0 :(得分:1)
在前两列中将merge
与外部联接一起使用,然后在DataFrame.dropna
中过滤出相同的行以及具有错误值的行:
df = df2.merge(df1, how='outer', on=['Sector','Plot'], suffixes=('_new','_prev'))
df = df[df['Status_new'].ne(df['Status_prev'])].dropna(subset=['Status_new','Status_prev'])
print (df)
Sector Plot Status_new Status_prev
0 SE1 1 Constructed UnderConstruction
2 SE1 3 Developed UnderConstruction
4 SE2 2 Developed Constructed
您可以创建如下功能:
def compare(df1, df2, on, comp):
df = df2.merge(df1, how='outer', on=on, suffixes=('_new','_prev'))
return (df[df[f'{comp}_new'].ne(df[f'{comp}_prev'])]
.dropna(subset=[f'{comp}_new',f'{comp}_prev']))
df = compare(df1, df2, ['Sector','Plot'], 'Status')