我正在编写python代码以搜索,删除和替换csv文件中的列 我有3个文件。
Input.csv:
aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
mmmmmmmm,nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx
delete.csv:
aaaaaaaa
eeeeeeee
uuuuuuuu
replace.csv:
iiiiiiii,11111111,22222222
mmmmmmmm,33333333,44444444
这是我的代码:
input_file='input.csv'
new_array=[]
for line in open(input_file):
data=line.split(',')
a==data[0]
b=data[1]
c=data[2]
d=data[3]
for line2 in open(delete):
if (name in line2)==True:
break
else:
for line1 in open(replace):
data1=line1.split(',')
aa=data1[0]
replaced_a=data1[1]
repalced_b=data1[2]
if (data[0]==data1[0]):
data[0]=data1[1]
data[2]=data1[2]
new_array=data
print(new_array)
else:
new_array=data
我的逻辑是:
1)open input.csv read line by line
2)load elements into an array
3)compare first element with entire delete.csv
4)if found in delete.csv then do nothing and take next line in array
5)if not found in delete.csv then compare with replace.csv
6)if the first element is found in the first column of replace.csv then replace the element by the corresponding second column of replace.csv and the second element with the corresponding 3rd third column of repalce.csv.
7)load this array into a bigger 10 element array.
所以我想要的输出是:
11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
所以现在我面临以下问题: 1)replace.csv或delete.csv中不存在的行不会被打印 2)我的input.csv有可能在一个条目中包含换行符,因此逐行读取是一个问题,但是可以确定分布在不同行上的数据是在引号之间。 例如:
aaaaa,bbbb,ccccc,"ddddddddddd
ddddddd"
11111,2222,3333,4444
在将代码和我的逻辑整合在一起方面提供的任何帮助。
答案 0 :(得分:2)
我建议对此进行一些更改:
replace
要写的东西
delete
的内容读入一组
遍历您的数据,并使用这两个查找来“做正确的事”。
我对您的数据做了一些更改,以合并提到的包括换行符的“转义”数据:
文件创建:
with open("i.csv","w") as f:
f.write("""
aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
"mmmm
mmmm",nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx""")
with open ("d.csv","w") as f:
f.write("""
aaaaaaaa
eeeeeeee
uuuuuuuu""")
with open ("r.csv","w") as f:
f.write("""
iiiiiiii,11111111,22222222
"mmmm
mmmm",33333333,44444444""")
程序:
import csv
def read_file(fn):
rows = []
with open(fn) as f:
reader = csv.reader(f, quotechar='"',delimiter=",")
for row in reader:
if row: # eliminate empty rows from data read
rows.append(row)
return rows
# create a dict for the replace stuff
replace = {x[0]:x[1:] for x in read_file("r.csv")}
# create a set for the delete stuff
delete = set( (row[0] for row in read_file("d.csv")) )
# collect what we need to write back
result = []
# https://docs.python.org/3/library/csv.html
with open("i.csv") as f:
reader = csv.reader(f, quotechar='"')
for row in reader:
if row:
if row[0] in delete:
continue # skip data row
elif row[0] in replace:
# replace with mapping, add rest of row
result.append(replace[row[0]] + row[2:]) # replace data
else:
result.append(row) # use as is
# write result back into file
with open ("done.csv", "w", newline="") as f:
w = csv.writer(f,quotechar='"', delimiter= ",")
w.writerows(result)
检查结果:
with open ("done.csv") as f:
print(f.read())
输出:
11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
Doku: