我想创建一个新的csv“Result.csv”,其中包含来自一个csv“New.csv”的行,这些行在另一个csv“Old.csv”中不存在。
例如,
Old.csv
约翰密歇根2018
密歇根州2018年
jane Ohio 2017
New.csv
约翰密歇根2018年Result.csv
2017年密歇根大学我在python中尝试了以下代码,我已经阅读了另一个问题,但这似乎不起作用,并给了我错误的输出。以下代码有什么问题吗?以下代码的其他替代方案?熊猫可能是一种选择吗?
with open('Old.csv', 'r') as f1:
old = f1.readlines()
with open('New.csv', 'r') as f2:
new = f2.readlines()
result = open("Result.csv", "w+")
for data in new:
if data not in old:
result.write(data)
result.close()
答案 0 :(得分:0)
假设:
<强> old.csv 强>
john,Michigan,2018
ron,Michigan,2018
jane,Ohio,2017
<强> new.csv 强>
john,Michigan,2018
jane,Ohio,2017
ron,Michigan,2017
jack,New York,2018
仅使用pandas
:
import pandas as pd
#open old csv as dataframe
old_df=pd.read_csv("old.csv",header=None)
#open new csv as dataframe
new_df=pd.read_csv("new.csv",header=None)
#join them
join_df=old_df.append(new_df,ignore_index=True)
#remove all duplicates
result_df=join_df.drop_duplicates(subset=None,keep=False)
#remove all present in old
result_df = result_df[~result_df.isin(old_df)].dropna()
#change float year to int
result_df[2] = result_df[2].astype(int)
#save as csv
result_df.to_csv("result.csv",header=None,index=None)
给出:
<强> result.csv 强>
ron,Michigan,2017
jack,New York,2018