我正在比较两个csv文件,但update.csv
文件与new.csv
相同
import csv
with open('old.csv', 'r') as t1:
old_csv = t1.readlines()
with open('new.csv', 'r') as t2:
new_csv = t2.readlines()
with open('update.csv', 'w') as out_file:
line_in_new = 0
line_in_old = 0
while line_in_new < len(new_csv) and line_in_old < len(old_csv):
if old_csv[line_in_old] != new_csv[line_in_new]:
out_file.write(new_csv[line_in_new])
else:
line_in_old += 1
line_in_new += 1
我希望输出与样本相同。
示例:
输入:
old.csv
a,b,c
1,2,3
4,5,6
8,9,9
new.csv
a,b,c
1,2,3
5,6,7
8,9,7
输出:
update.csv
4,5,6,deleted
5,6,7,new added
8,9,9,change
请帮助我在update.csv
答案 0 :(得分:2)
使用pandas的解决方案:
import pandas as pd
df1 = pd.read_csv('old.csv')
df2 = pd.read_csv('new.csv')
df1['flag'] = 'old'
df2['flag'] = 'new'
df = pd.concat([df1, df2])
dups_dropped = df.drop_duplicates(df.columns.difference(['flag']), keep=False)
dups_dropped.to_csv('update.csv', index=False)
输入:
<强> old.csv 强>
a,b,c
1,2,3
4,5,6
<强> new.csv 强>
a,b,c
1,2,3
5,6,7
输出:
<强> update.csv 强>
a,b,c,flag
4,5,6,old
5,6,7,new