比较熊猫中两个数据框中一列的差异

时间:2018-09-27 17:18:38

标签: python pandas

我有两个数据集,我试图根据唯一ID仅比较一列。我想跟踪并标记该列中的所有值更改,并将这些更改输出到另一个DF。

DF1:

ID      Status
1234    Cleared
4321    Pending
5678    Distributed
8765    Validating
2468    Blocked
8642    Pending
1357    Pending
7531    Distributed

DF2:

ID      Status
1234    Distributed
4321    Pending
5678    Pending
8765    Cleared
2468    Blocked
8642    Blocked
1357    Cleared
7531    Blocked

输出:

ID      Status       Status
1234    Cleared      Distributed
5678    Distributed  Pending
8765    Validating   Cleared
8642    Pending      Blocked
1357    Pending      Cleared
7531    Distributed  Blocked

最后,我也试图根据状态列的更改来查看另一列的任何更改。此列包含使用标准ISO Alpha-2国家/地区代码的国家/地区列表。我想在这里进行简单的字符计数,但这没有意义,因为如果将US删除并替换为DE,则字符计数将保持不变。

我为所有这些代码(从此处的其他问题改编而成)如下,但我觉得可能有一种更有效的方法...

for index, compare_row in compare_df.iterrows():
row_df1 = df1.loc[df1['ID'] == compare_row['ID']]    
row_df2 = df2.loc[df2['ID'] == compare_row['ID']]    
if (row_df1.iloc[0]['Status'] != row_df2.iloc[0]['Status']):
    print "here 1"
    output_df.append(row_df1)
    output_df.append(row_df2)
elif (row_df1.iloc[0]['Status'] in ['Cleared', 'Distributed']) & (row_df1.iloc[0]['Territory'] != row_df2.iloc[0]['Territory']):
    print "here 2"
    output_df.append(row_df1)
    output_df.append(row_df2)

3 个答案:

答案 0 :(得分:2)

使用merge

df3 = df1.merge(df2, left_index = True, right_index = True)
mask = df3['Status_x'] == df3['Status_y']
df3 = df3[~mask]

答案 1 :(得分:0)

这可能不是最有效的方法,但至少可以达到目标。 :)

df3 = df1.copy()
df3['Status_df2'] = df2.Status.copy()
df3 = df3.loc[df3.Status != df3.Status_df2]

答案 2 :(得分:0)

使用.query可以提高可读性。

DF1.merge(DF2, on = 'ID').query('Status_x != Status_y')

输出:

     ID     Status_x     Status_y
0  1234      Cleared  Distributed
2  5678  Distributed      Pending
3  8765   Validating      Cleared
5  8642      Pending      Blocked
6  1357      Pending      Cleared
7  7531  Distributed      Blocked