我想弄清楚如何显示2个Pandas DataFrames之间的差异。我几乎在那里,但似乎无法弄清楚如何显示包含差异的行的其他数据。
这是我到目前为止所做的:
将DataFrame A与DataFrame B比较:
DataFrame A:
Date ID_1 ID_2 Value
1-Jan 1 1 5
2-Jan 1 2 6
3-Jan 1 3 4
4-Jan 1 4 2
5-Jan 1 5 8
DataFrame B:
Date ID_1 ID_2 Value
1-Jan 1 1 5
2-Jan 1 2 6
3-Jan 1 3 4
4-Jan 1 4 2
5-Jan 1 5 55
当前输出:
Date Column From To
5-Jan Value 8 55
期望的输出:
Date ID_1 ID_2 From To
5-Jan 1 5 8 55
当前代码:
#stack column(s) where dataframes are not equal
ne_stacked = (df1 != df2).stack()
#create new dataframe from ne_stacked
changed = ne_stacked[ne_stacked]
#change column names
changed.index.names = ['Date', 'Column']
#create array where dataframes are not equal
diff_loc = np.where(df1 != df2)
#create 'from' column
changed_from = df1.values[diff_loc]
#create 'to' column
changed_to = df2.values[diff_loc]
#create a summary dataframe
final = pd.DataFrame({'From': changed_from, 'To': changed_to}, index=changed.index)
print final
答案 0 :(得分:1)
使用merge
:
In [29]:
print df_a
Date ID_1 ID_2 Value
0 1-Jan 1 1 5
1 2-Jan 1 2 6
2 3-Jan 1 3 4
3 4-Jan 1 4 2
4 5-Jan 1 5 8
In [30]:
print df_b
Date ID_1 ID_2 Value
0 1-Jan 1 1 5
1 2-Jan 1 2 6
2 3-Jan 1 3 4
3 4-Jan 1 4 2
4 5-Jan 1 5 55
In [31]:
df_c = pd.merge(df_a, df_b,
how='outer',
on=['Date', 'ID_1', 'ID_2'])
df_c.columns = ['Date', 'ID_1', 'ID_2', 'From', 'To']
df_c = df_c[df_c.From!=df_c.To]
print df_c
Date ID_1 ID_2 From To
4 5-Jan 1 5 8 55
答案 1 :(得分:0)
试试这个:
dfm = df1.merge(df2, on=['Date', 'ID_1', 'ID_2']).rename(columns={'Value_x':'From', 'Value_y':'To'})
print dfm[dfm.From != dfm.To]
Date ID_1 ID_2 From To
4 5-Jan 1 5 8 55