这是我正在编写的代码
import csv
import openpyxl
def read_file(fn):
rows = []
with open(fn) as f:
reader = csv.reader(f, quotechar='"',delimiter=",")
for row in reader:
if row:
rows.append(row)
return rows
replace = {x[0]:x[1:] for x in read_file("replace.csv")}
delete = set( (row[0] for row in read_file("delete.csv")) )
result = []
input_file="input.csv"
with open(input_file) as f:
reader = csv.reader(f, quotechar='"')
for row in reader:
if row:
if row[7] in delete:
continue
elif row[7] in replace:
result.append(replace[row[7]])
else:
result.append(row)
with open ("done.csv", "w+", newline="") as f:
w = csv.writer(f,quotechar='"', delimiter= ",")
w.writerows(result)
这是我的文件:
input.csv:
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
"-","-","-","-","-","-","-","aaaaa","-","-","bbbbb","-",","
"-","-","-","-","-","-","-","ccccc","-","-","ddddd","-",","
"-","-","-","-","-","-","-","eeeee","-","-","fffff","-",","
这是一个13列的csv。我只对第8和第11领域感兴趣。
这是我的replace.csv:
"aaaaa","11111","22222"
delete.csv:
ccccc
所以我在做的是将replace.csv的第一列(逐行)与input.csv的第八列进行比较,如果匹配,则将input.csv的第八列替换为replace.csv的第二列输入的第11列和replace.csv的第3列 对于delete.csv,它逐行比较两个文件,如果找到匹配项,则删除整行。 并且如果replace.csv或delete.csv中没有任何行,则按原样打印该行。 所以我想要的输出是:
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
"-","-","-","-","-","-","-",11111,"-","-",22222,"-",","
"-","-","-","-","-","-","-","eeeee","-","-","fffff","-",","
但是当我运行这段代码时,它会给我这样的输出:
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
11111,22222
我要去哪里错了? 我正在尝试更改我之前发布过的问题的程序。由于输入文件已更改,因此我试图更改程序。 https://stackoverflow.com/a/54388144/9279313
答案 0 :(得分:2)
@anuj 我认为SafeDev的解决方案是最佳选择,但是如果您不想使用熊猫,只需对代码做些改动即可。
for row in reader:
if row:
if row[7] in delete:
continue
elif row[7] in replace:
key = row[7]
row[7] = replace[key][0]
row[10]= replace[key][1]
result.append(row)
else:
result.append(row)
希望这可以解决您的问题。
答案 1 :(得分:1)
实际上很简单。不用从头开始,只需使用熊猫库即可。从那里开始,更容易处理任何数据集。这是您的操作方式:
编辑:
import pandas as pd
input_csv = pd.read_csv('input.csv')
replace_csv = pd.read_csv('replace.csv', header=None)
delete_csv = pd.read_csv('delete.csv')
r_lst = [i for i in replace_csv.iloc[:, 0]]
d_lst = [i for i in delete_csv]
input2_csv = pd.DataFrame.copy(input_csv)
for i, row in input_csv.iterrows():
if row['c8'] in r_lst:
input2_csv.loc[i, 'c8'] = replace_csv.iloc[r_lst.index(row['c8']), 1]
input2_csv.loc[i, 'c11'] = replace_csv.iloc[r_lst.index(row['c8']), 2]
if row['c8'] in d_lst:
input2_csv = input2_csv[input2_csv.c8 != row['c8']]
input2_csv.to_csv('output.csv', index=False)
通过将其转换为具有列名参数的函数,并用这两个参数替换“ c8”和“ c11”,可以使该过程更加动态。