Comparing two Pandas dataframes for differences on common dates

时间:2017-04-13 14:56:12

标签: python python-2.7 pandas

I have two data frames, one with historical data and one with some new data appended to the historical data as:

raw_data1 = {'Series_Date':['2017-03-10','2017-03-11','2017-03-12','2017-03-13','2017-03-14','2017-03-15'],'Value':[1,2,3,4,5,6]}
import pandas as pd
df_history = pd.DataFrame(raw_data1, columns = ['Series_Date','Value'])
print df_history

raw_data2 = {'Series_Date':['2017-03-10','2017-03-11','2017-03-12','2017-03-13','2017-03-14','2017-03-15','2017-03-16','2017-03-17'],'Value':[1,2,3,4,4,5,6,7]}
import pandas as pd
df_new = pd.DataFrame(raw_data2, columns = ['Series_Date','Value'])
print df_new

I want to check for all dates in df_history, if data in df_new is different. If data is different then it should append to df_check dataframe as follows:

raw_data3 = {'Series_Date':['2017-03-14','2017-03-15'],'Value_history':[5,6], 'Value_new':[4,5]}
import pandas as pd
df_check = pd.DataFrame(raw_data3, columns = ['Series_Date','Value_history','Value_new'])
print df_check

The key point is that I want to check for all dates that are in my df_history DF and check if a value is present for that day in the df_new DF and if it's same.

1 个答案:

答案 0 :(得分:0)

只需运行mergequery过滤器即可捕获 Value_history 不等于 Value_new

的记录
df_check = pd.merge(df_history, df_new, on='Series_Date', suffixes=['_history', '_new'])\
             .query('Value_history != Value_new').reset_index(drop=True)

#   Series_Date  Value_history  Value_new
# 0  2017-03-14              5          4
# 1  2017-03-15              6          5