我需要跟踪两个数据帧之间的变化:
df1 = DataFrame({'A':['a1','Changed','a3','Deleted'],'B':['b2','b2','b3','Deleted']})
df2 = DataFrame({'A':['a1','a2','a3','Added'],'B':['b2','b2','Changed','Added']})
df1:
A B df1
0 a1 b2 1
1 Changed b2 1
2 a3 b3 1
3 Deleted Deleted 1
df2:
A B df2
0 a1 b2 1
1 a2 b2 1
2 a3 Changed 1
3 Added Added 1
所需的输出:
A B df1 df2 Diff Delta
Added Added 0 1 -1 Added
Changed b2 1 0 1 Changed
a2 b2 0 1 -1 Added
a3 Changed 1 0 1 Changed
Deleted Deleted 1 0 1 Deleted
需要跟踪的情况是:对现有记录的任何更改,添加或删除。
我尝试用1初始化每个数据帧中一行的存在并进行外部合并,获取'df1'和'df2'之间的差异,并使用所有的diff设计可以分配更改,删除操作的逻辑,是对数据帧的补充。例如,在diff == -1的情况下,意味着'df1'= 0和'df2'= 1表示这是加法。
df1['df1'] = 1
df2['df2'] = 1
df = pandas.merge(df1,df2,on=['A','B'],how='outer')
df.fillna(0,inplace=True)
df['Diff'] = df['df1'] - df['df2']
df['Delta'] = numpy.nan
df.loc[df['A'].duplicated(keep=False)&df['Diff']!=0,'Delta'] ='Changed'
df.loc[(df['Delta'].isnull())&(df['Diff']== (-1)),'Delta'] = 'Added'
df.loc[(df['Delta'].isnull())&(df['Diff']== (1)),'Delta'] = 'Deleted'
print df.loc[df['Delta'].notnull()].drop_duplicates(subset='A')
这给了我
我已经添加了列“ Correct_Delta”,指示什么是正确的输出列“ Delta”。突出显示的单元格有问题。
PS:“ A”列始终是唯一的,“ A”和“ B”列在某些情况下将具有多对一关系,在某些情况下将具有一对一关系。
有人可以帮忙吗?