我有两个数据集
1设置它有一个列,其中包含电子邮件地址列表:
DF1
Email
xxxx@abc.gov
xxxx@abc.gov
xxxx@abc.gov
xxxx@abc.gov
xxxx@abc.gov
第二节csv Dataframe2
Email
xxxx@abc.gov
xxxx@abc.gov
xxxx@abc.gov
xxxx@abc.gov
dddd@abc.com
dddd@abc.com
3333@abc.com
import pandas as pd
SansList = r'C:\\Sans compare\\SansList.csv'
AllUsers = r'C:\\Sans compare\\AllUser.csv'
## print Name column only and turn into data sets from CSV ##
df1 = pd.read_csv(SansList, usecols=[0])
df2 = pd.read_csv(AllUsers, usecols=[2])
**print(df1['Email'].isin(df2)==False)**
我希望结果是,
Dataframe3
dddd@abc.com
dddd@abc.com
3333@abc.com
不太确定如何修复我的数据集...... :(
答案 0 :(得分:1)
选项1
isin
df2[~df2.Email.isin(df1.Email)]
Email
4 dddd@abc.com
5 dddd@abc.com
6 3333@abc.com
选项2
query
df2.query('Email not in @df1.Email')
Email
4 dddd@abc.com
5 dddd@abc.com
6 3333@abc.com
选项3
merge
pd.DataFrame.merge
的 indicator=True
可让您查看该行来自哪个数据帧。然后我们可以过滤它。
df2.merge(
df1, 'outer', indicator=True
).query('_merge == "left_only"').drop('_merge', 1)
Email
20 dddd@abc.com
21 dddd@abc.com
22 3333@abc.com
答案 1 :(得分:1)
Numpy解决方案:
In [311]: df2[~np.in1d(df2.Email, df1.Email)]
Out[311]:
Email
4 dddd@abc.com
5 dddd@abc.com
6 3333@abc.com