跟踪两个数据框之间的变化

时间:2018-09-17 14:18:19

标签: python pandas

我需要跟踪两个数据帧之间的变化:

df1 = DataFrame({'A':['a1','Changed','a3','Deleted'],'B':['b2','b2','b3','Deleted']})
df2 = DataFrame({'A':['a1','a2','a3','Added'],'B':['b2','b2','Changed','Added']})

df1:

    A      B       df1
0   a1      b2      1
1   Changed b2      1
2   a3      b3      1
3   Deleted Deleted 1

df2:

    A       B      df2
0   a1      b2      1
1   a2      b2      1
2   a3      Changed 1
3   Added   Added   1

所需的输出:

A        B         df1  df2 Diff    Delta
Added    Added      0   1   -1      Added
Changed  b2         1   0    1      Changed
a2       b2         0   1   -1      Added
a3       Changed    1   0    1      Changed
Deleted  Deleted    1   0    1      Deleted

需要跟踪的情况是:对现有记录的任何更改,添加或删除。

我尝试用1初始化每个数据帧中一行的存在并进行外部合并,获取'df1'和'df2'之间的差异,并使用所有的diff设计可以分配更改,删除操作的逻辑,是对数据帧的补充。例如,在diff == -1的情况下,意味着'df1'= 0和'df2'= 1表示这是加法。

df1['df1'] = 1
df2['df2'] = 1
df = pandas.merge(df1,df2,on=['A','B'],how='outer')
df.fillna(0,inplace=True)
df['Diff'] = df['df1'] - df['df2']
df['Delta'] = numpy.nan
df.loc[df['A'].duplicated(keep=False)&df['Diff']!=0,'Delta'] ='Changed'
df.loc[(df['Delta'].isnull())&(df['Diff']== (-1)),'Delta'] = 'Added'
df.loc[(df['Delta'].isnull())&(df['Diff']== (1)),'Delta'] = 'Deleted'
print df.loc[df['Delta'].notnull()].drop_duplicates(subset='A')

这给了我

enter image description here 我已经添加了列“ Correct_Delta”,指示什么是正确的输出列“ Delta”。突出显示的单元格有问题。

PS:“ A”列始终是唯一的,“ A”和“ B”列在某些情况下将具有多对一关系,在某些情况下将具有一对一关系。

有人可以帮忙吗?

0 个答案:

没有答案