如何按列比较两个CSV文件并使用Pandas Python将CSV文件中的差异保存

时间:2019-03-30 23:09:19

标签: python pandas

我有两个如下的csv文件

File1

x1 10.00 a1
x2 10.00 a2
x3 11.00 a1
x4 10.50 a2
x5 10.00 a3
x6 12.00 a3

File2

x1
x4
x5

我想创建一个包含

的新文件
x2
x3
x6

使用熊猫或python

1 个答案:

答案 0 :(得分:1)

使用Series.isin~来过滤df1[0]中不存在的值-在第一列中使用DataFrame.locboolean indexing

import pandas as pd

#create DataFrame from first file
df1 = pd.read_csv(file1, sep=";", header=None)
print (df1)
    0     1   2
0  x1  10.0  a1
1  x2  10.0  a2
2  x3  11.0  a1
3  x4  10.5  a2
4  x5  10.0  a3
5  x6  12.0  a3

#create DataFrame from second file
df2 = pd.read_csv(file2, header=None, sep='|')
print (df2)
    0
0  x1
1  x4
2  x5

s = df1.loc[~df1[0].isin(df2[0]), 0]
print (s)
1    x2
2    x3
5    x6
Name: 0, dtype: object

#write to file
s.to_csv('new.csv', index=False, header=False)