我有两个如下的csv文件
File1
x1 10.00 a1
x2 10.00 a2
x3 11.00 a1
x4 10.50 a2
x5 10.00 a3
x6 12.00 a3
File2
x1
x4
x5
我想创建一个包含
的新文件x2
x3
x6
使用熊猫或python
答案 0 :(得分:1)
使用Series.isin
和~
来过滤df1[0]
中不存在的值-在第一列中使用DataFrame.loc
和boolean indexing
:
import pandas as pd
#create DataFrame from first file
df1 = pd.read_csv(file1, sep=";", header=None)
print (df1)
0 1 2
0 x1 10.0 a1
1 x2 10.0 a2
2 x3 11.0 a1
3 x4 10.5 a2
4 x5 10.0 a3
5 x6 12.0 a3
#create DataFrame from second file
df2 = pd.read_csv(file2, header=None, sep='|')
print (df2)
0
0 x1
1 x4
2 x5
s = df1.loc[~df1[0].isin(df2[0]), 0]
print (s)
1 x2
2 x3
5 x6
Name: 0, dtype: object
#write to file
s.to_csv('new.csv', index=False, header=False)