我有两个DF,我想比较DF1和DF2中值的变化。我知道我需要将两者合并,以便使“状态”列对齐,但是我也想只输出状态发生任何变化的ID。
DF1:
ID Status
1234 Cleared
5678 Validating
4321 Pending
8765 Cleared
9876 Blocked
6789 Blocked
DF2:
ID Status
1234 Blocked
5678 Validating
4321 Pending
8765 Cleared
9876 Validating
6789 Blocked
输出:
ID Status1 Status2
1234 Cleared Blocked
9876 Blocked Validating
答案 0 :(得分:2)
示例数据:
df1 = pd.DataFrame(['Cleared', 'Validating', 'Pending', 'Cleared', 'Blocked', 'Blocked'], index = [1234, 5678, 4321, 8765, 9876, 6789], columns=['Status'])
df1.index.name = 'ID'
df2 = pd.DataFrame(['Blocked', 'Validating', 'Pending', 'Cleared', 'Validating', 'Blocked'], index = [1234, 5678, 4321, 8765, 9876, 6789], columns = ['Status'])
df2.index.name = 'ID'
加入df1
和df2
,为加入的DataFrame上的列提供后缀
df = df1.join(df2, lsuffix='_1', rsuffix='_2')
然后使用布尔索引
df[df.Status_1 != df.Status_2]
答案 1 :(得分:0)
这可能不是最有效的方法,但至少可以达到目标。 :)
df3 = df1.copy()
df3['Status_df2'] = df2.Status.copy()
df3 = df3.loc[df3.Status != df3.Status_df2]