检查两个数据帧的差异并创建新的数据帧

时间:2017-05-16 07:05:13

标签: python pandas

我需要一些关于python pandas的指导,因为它对于前端开发来说是一个未知领域。我现在熟悉数据帧概念。我希望通过比较其他两个数据帧来找到创建新数据帧的方法。为此,我应该在熊猫中寻找什么?

例如,将df1视为

 Date            col1     col2     col3     id
 2017-04-14      2482        1        0     a2
 2017-04-15      2483        1        0     a3

和df2为

 Date            col1     col2     col3     id
 2017-04-15      2483       10       20     a3
 2017-04-14      2482       11        0     a2

所以我想要实现的是创建一个新的数据框,其中包含与

不同的值的详细信息
 Date            df1_value    df2_valuue    diff_col_name    val_diff     id
 2017-04-14      1            11            col2             -10          a2
 2017-04-15      1            11            col2              -9          a3
 2017-04-15      0            20            col3              20          a3

所以我能够基于id,df1.merge(df2, on='id', how='left')加入两个dfs,但下一步应该是什么。如何比较差异并创建最终的df?

1 个答案:

答案 0 :(得分:0)

<强>设置

df1 = pd.DataFrame({'Date': {0: '2017-04-14', 1: '2017-04-15'},
 'col1': {0: 2482, 1: 2483},
 'col2': {0: 1, 1: 1},
 'col3': {0: 0, 1: 0},
 'id': {0: 'a2', 1: 'a3'}})

df2 = pd.DataFrame({'Date': {0: '2017-04-15', 1: '2017-04-14'},
 'col1': {0: 2483, 1: 2482},
 'col2': {0: 10, 1: 11},
 'col3': {0: 20, 1: 0},
 'id': {0: 'a3', 1: 'a2'}})

<强>解决方案

#melt the dfs to long df from wide df and merge them together.
dfm = pd.merge(pd.melt(df1,id_vars=['Date','id']),
               pd.melt(df2,id_vars=['Date','id']),
               how='outer',on=['Date','id','variable'])

#rename columns
dfm.columns=['Date','id','diff_col_name','df1_value','df2_value']
#compare values
dfm['val_diff'] = dfm.df1_value-dfm.df2_value
#reorder columns
dfm = dfm[['Date','df1_value','df2_value','diff_col_name','val_diff','id']]
#filter unequal values
dfm=dfm[dfm.val_diff!=0]

Out[2001]: 
         Date  df1_value  df2_value diff_col_name  val_diff  id
2  2017-04-14          1         11          col2       -10  a2
3  2017-04-15          1         10          col2        -9  a3
5  2017-04-15          0         20          col3       -20  a3