我有2个如下的excel csv文件
df1 = {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 0, 2]}
df1 = pd.DataFrame(df1, columns=df1.keys())
df2 = {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink', 'SC-002_Signinlink'], 'Count': [2, 1, 2, 1]}
df2 = pd.DataFrame(df2, columns=df2.keys())
在df2中,我可以看到还有一个名为'SC-002_Signinlink'的额外事务,在df1中不存在。有人可以帮助我如何仅查找那些多余的交易并将其打印到文件中吗?
到目前为止,我已经完成了以下工作来获取交易...
merged_df = pd.merge(df1, df2, on = 'Transaction_Name', suffixes=('_df1', '_df2'), how='outer')
答案 0 :(得分:1)
在合并中使用indicator = True:
df1 = {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 0, 2]}
df1 = pd.DataFrame(df1, columns=df1.keys())
df2 = {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink', 'SC-002_Signinlink'], 'Count': [2, 1, 2, 1]}
df2 = pd.DataFrame(df2, columns=df2.keys())
df = pd.merge(df1, df2, on='Transaction_Name', how='outer', indicator=True)
# As we do not merge on Count, we have 2 count columns (Count_x & Count_y)
# So we create a Count column which is the addition of the 2
df.Count_x = df.Count_x.fillna(0)
df.Count_y = df.Count_y.fillna(0)
print(df.dtypes)
df['Count'] = df.Count_x + df.Count_y
df = df.loc[df._merge != 'both', ['Transaction_Name', 'Count']]
print(df)
# Missing transactions list :
print(df.Transaction_Name.values.tolist())
打印输出(df.dtypes)
Transaction_Name object
Count_x float64
Count_y int64
_merge category
dtype: object
打印输出(df)
Transaction_Name Count
3 SC-002_Signinlink 1.0
打印输出(df.Transaction_Name.values.tolist())
['SC-002_Signinlink']