Question

我有两个如下的csv文件

File1

x1 10.00 a1
x2 10.00 a2
x3 11.00 a1
x4 10.50 a2
x5 10.00 a3
x6 12.00 a3

File2

x1
x4
x5

我想创建一个包含

的新文件

x2
x3
x6

使用熊猫或python

Answer 1

使用Series.isin和~来过滤df1[0]中不存在的值-在第一列中使用DataFrame.loc和boolean indexing：

import pandas as pd

#create DataFrame from first file
df1 = pd.read_csv(file1, sep=";", header=None)
print (df1)
    0     1   2
0  x1  10.0  a1
1  x2  10.0  a2
2  x3  11.0  a1
3  x4  10.5  a2
4  x5  10.0  a3
5  x6  12.0  a3

#create DataFrame from second file
df2 = pd.read_csv(file2, header=None, sep='|')
print (df2)
    0
0  x1
1  x4
2  x5

s = df1.loc[~df1[0].isin(df2[0]), 0]
print (s)
1    x2
2    x3
5    x6
Name: 0, dtype: object

#write to file
s.to_csv('new.csv', index=False, header=False)

如何按列比较两个CSV文件并使用Pandas Python将CSV文件中的差异保存

1 个答案: