我尝试比较两个csv文件。第一个文件(movements.csv)有14列,第二个csv(LCC.csv)有一列。我想检查movements.csv中第8列的条目(字符串)是否出现在LCC.csv第1列的某处。如果是这样,在第14栏a'是'应该写,如果不是' No'。我到目前为止尝试的代码是我收到的错误消息:
import csv
f1 = file('LCC.csv', 'rb')
f2 = file('movements.csv', 'rb')
f3 = ('output.csv', 'wb')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)
movements = list(c2)
for LCC_row in c1:
row = 0
found = False
for movements_row in movements:
output_row = movements_row
if movements_row[7] == LCC_row[0]
output_row.append('Yes')
found = True
break
row += 1
if not found:
output_row.append('No')
c3.writerow(output_row)
f1.close()
f2.close()
f3.close()
我是一个完整的python初学者,所以任何建议都表示赞赏!最理想的是,两列之间的检查也会忽略字符串是否用大写字母书写。
之后出现错误消息
c3.writerow(output_row)
作为
追踪(最近一次呼叫最后一次):
File "<stdin>", line 1, in <module>
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
>>>
LCC.csv(无标题):
Air Ab
Jamb
Sw
AIRF
EURO
movements.csv(有一个标题):
ap,ic,year,y_m,pas,da,ty,airl,ic_a,dest_orig,ic_d,coun,cont,LCC
Zue,LSZH,2005,200501,25,1/1/2005,Dep,"EURO",EUJ,"Mans C",EG,Gb,Eu,
Zue,LSZH,2005,200501,204,1/1/2005,Arr,"Sw",SWR,"Dar",HA,Tans,A,
Ba,LSZM,2005,200501,191,1/1/2005,Arr,"AIRF",AFR,"PG",LG,Fr,Eu,
Zue,LSZH,2005,200501,228,1/1/2005,Dep,"THA",THA,Bang,VD,Th,As,
如前所述,最后一列(LCC)目前完全是空的
答案 0 :(得分:1)
它有很多问题。在看了一下代码之后我发现的很少:
您的行中有无效引用'
:
f2 = file('movements.csv', ,rb')
# ^
应该是:
f2 = file('movements.csv', 'rb')
在您分享的代码中,您在各个地方都有后引用,而不是单引号 '
。例如,您的行应为:
f1 = file('LCC.csv', 'rb')
f3 = file('output.csv', 'wb')
# ^ also missing file here
:
后缺少冒号if
。它应该是:
if movements_row[7] == LCC_row[0]:
# Here ^
此外,要初始化字符串,您不需要括号。只需指定它:
output_row[13] = 'Yes'
# ^ As simple string
答案 1 :(得分:0)
你试图同时做太多事情。将其拆分为不同的任务。首先,我们将LCC.csv
的内容读入一个集合(我们可以使用列表,但设置更适合确定成员资格)。然后我们将通过movements.csv
重写它。
import csv
with open('LCC.csv', 'rb') as lcc:
lcc_set = set()
lcc_r = csv.reader(lcc)
for l in lcc_r:
for i in l:
lcc_set.add(i)
with open('movements.csv', 'rb') as movements:
mov_r = csv.reader(movements)
with open('output.csv', 'wb') as output:
out_w = csv.writer(output)
for l in mov_r:
#l.pop()
if l[7] in lcc_set:
l.append('Yes')
else:
l.append('No')
out_w.writerow(l)
我不清楚你是想添加一个列还是替换最后一个列。我已注释掉将导致最后一列替换为Yes
或No
答案 2 :(得分:0)
您的代码中存在相当多的错误。他们在这里被指出:https://stackoverflow.com/a/41224147/3027854
moment.csv的一个问题
ap,ic,year,y_m,pas,da,ty,airl,ic_a,dest_orig,ic_d,coun,cont,LCC
Zue,LSZH,2005,200501,25,1/1/2005,Dep,"EURO",EUJ,"Mans C",EG,Gb,Eu,
Zue,LSZH,2005,200501,204,1/1/2005,Arr,"Sw",SWR,"Dar",HA,Tans,A,
Ba,LSZM,2005,200501,191,1/1/2005,Arr,"AIRF",AFR,"PG",LG,Fr,Eu,
Zue,LSZH,2005,200501,228,1/1/2005,Dep,"THA",THA,Bang,VD,Th,As,
除标题行外,每行还有一个额外的列。因为他们以&#34;,&#34;结束。我在我的代码
中添加了处理import csv
f1 = open('LCC.csv', 'rU')
f2 = open('movements.csv', 'rU')
f3 = open('output.csv', 'w')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)
# first we will read all LCC values into a set.
LCC_row_values = set()
for LCC_row in c1:
LCC_row_values.add(LCC_row[0].strip())
row = 0
for movements_row in c2:
row += 1
if row == 1:
# movements_row.append('is_present')
# c3.writerow(movements_row)
# skip header of moments.csv file
continue
# Remove last extra column from output row
output_row = movements_row[:-1]
if movements_row[7] in LCC_row_values:
output_row.append('Yes')
else:
output_row.append('No')
c3.writerow(output_row)
f1.close()
f2.close()
f3.close()
这里的示例文件是
LCC.csv
Air Ab
Jamb
Sw
AIRF
EURO
movements.csv
ap,ic,year,y_m,pas,da,ty,airl,ic_a,dest_orig,ic_d,coun,cont,LCC
Zue,LSZH,2005,200501,25,1/1/2005,Dep,"EURO",EUJ,"Mans C",EG,Gb,Eu,
Zue,LSZH,2005,200501,204,1/1/2005,Arr,"Sw",SWR,"Dar",HA,Tans,A,
Ba,LSZM,2005,200501,191,1/1/2005,Arr,"AIRF",AFR,"PG",LG,Fr,Eu,
Zue,LSZH,2005,200501,228,1/1/2005,Dep,"THA",THA,Bang,VD,Th,As,
output.csv
Zue,LSZH,2005,200501,25,1/1/2005,Dep,EURO,EUJ,Mans C,EG,Gb,Eu,Yes
Zue,LSZH,2005,200501,204,1/1/2005,Arr,Sw,SWR,Dar,HA,Tans,A,Yes
Ba,LSZM,2005,200501,191,1/1/2005,Arr,AIRF,AFR,PG,LG,Fr,Eu,Yes
Zue,LSZH,2005,200501,228,1/1/2005,Dep,THA,THA,Bang,VD,Th,As,No