Pandas DataFrames的差异

时间:2015-04-03 13:56:46

标签: python pandas

我想弄清楚如何显示2个Pandas DataFrames之间的差异。我几乎在那里,但似乎无法弄清楚如何显示包含差异的行的其他数据。

这是我到目前为止所做的:

将DataFrame A与DataFrame B比较:

DataFrame A:

Date  ID_1 ID_2 Value
1-Jan   1   1   5
2-Jan   1   2   6
3-Jan   1   3   4
4-Jan   1   4   2
5-Jan   1   5   8

DataFrame B:

Date  ID_1 ID_2 Value
1-Jan   1   1   5
2-Jan   1   2   6
3-Jan   1   3   4
4-Jan   1   4   2
5-Jan   1   5   55

当前输出:

Date    Column  From To
5-Jan   Value   8    55

期望的输出:

Date    ID_1 ID_2 From  To
5-Jan   1    5     8    55

当前代码:

#stack column(s) where dataframes are not equal
ne_stacked = (df1 != df2).stack()

#create new dataframe from ne_stacked
changed = ne_stacked[ne_stacked]

#change column names
changed.index.names = ['Date', 'Column']

#create array where dataframes are not equal
diff_loc = np.where(df1 != df2)

#create 'from' column
changed_from = df1.values[diff_loc]

#create 'to' column
changed_to = df2.values[diff_loc]

#create a summary dataframe
final = pd.DataFrame({'From': changed_from, 'To': changed_to}, index=changed.index)

print final

2 个答案:

答案 0 :(得分:1)

使用merge

In [29]:

print df_a
    Date  ID_1  ID_2  Value
0  1-Jan     1     1      5
1  2-Jan     1     2      6
2  3-Jan     1     3      4
3  4-Jan     1     4      2
4  5-Jan     1     5      8
In [30]:

print df_b
    Date  ID_1  ID_2  Value
0  1-Jan     1     1      5
1  2-Jan     1     2      6
2  3-Jan     1     3      4
3  4-Jan     1     4      2
4  5-Jan     1     5     55
In [31]:

df_c = pd.merge(df_a, df_b,
                how='outer',
                on=['Date', 'ID_1', 'ID_2'])
df_c.columns = ['Date', 'ID_1', 'ID_2', 'From', 'To']
df_c = df_c[df_c.From!=df_c.To]
print df_c
    Date  ID_1  ID_2  From  To
4  5-Jan     1     5     8  55

答案 1 :(得分:0)

试试这个:

dfm = df1.merge(df2, on=['Date', 'ID_1', 'ID_2']).rename(columns={'Value_x':'From', 'Value_y':'To'})
print dfm[dfm.From != dfm.To]

    Date  ID_1  ID_2  From  To
4  5-Jan     1     5     8  55