我有一个奇怪的python脚本:
#!/usr/local/bin/python2.7
import csv
import sys
import name_tools as nt
f = open(sys.argv[1], 'r')
try:
reader = csv.reader(f)
for row in reader:
row = [author.strip() for author in row]
print row[0], row[1], row[2], row[3], len(row)
if row[1] and row[2]:
print len(row), row[0], len(row), nt.match(row[1], row[2])
finally:
f.close()
输入文件如下:
" ROW_NUMBER "," AUTHOR_3PCATALOG "," AUTHOR_AMAZON "," LABEL "
" 1 "," "," Tavoillot, Pierre-Henri "," NEGATIVE "
" 2 "," "," di Bonaventura, Allegra "," NEGATIVE "
" 3 "," Stephen Jones "," Jones, Stephen "," NEGATIVE "
" 4 "," "," Weisenberger, Dr. Gary J. "," NEGATIVE "
" 5 "," Cassandra Austin "," Irwin "," NEGATIVE "
" 6 "," "," Cain, Paul "," NEGATIVE "
" 473 "," Craig Harline "," William Craig "," NEGATIVE "
以某种方式,行号为473的记录会产生以下异常:
Traceback (most recent call last):
File "./author_preprocessing.py", line 14, in <module>
print len(row), row[0], len(row), nt.match(row[1], row[2])
File "build/bdist.linux-x86_64/egg/name_tools/match.py", line 102, in match
File "build/bdist.linux-x86_64/egg/name_tools/match.py", line 19, in middle_initials
IndexError: string index out of range
该计划的输出是:
ROW_NUMBER AUTHOR_3PCATALOG AUTHOR_AMAZON LABEL 4
4 ROW_NUMBER 4 0.0
1 Tavoillot, Pierre-Henri NEGATIVE 4
2 di Bonaventura, Allegra NEGATIVE 4
3 Stephen Jones Jones, Stephen NEGATIVE 4
4 3 4 0.98
4 Weisenberger, Dr. Gary J. NEGATIVE 4
5 Cassandra Austin Irwin NEGATIVE 4
4 5 4 0.0
6 Cain, Paul NEGATIVE 4
473 Craig Harline William Craig NEGATIVE 4
4 473 4
答案 0 :(得分:0)
name_tools
似乎拆分了内部传递的字符串参数,并且不处理字符之间有多个空格。因此,在特定情况下,Craig Harline
在Craig
和Harline
之间有两个空格。 split()
和join()
为我做了。
答案 1 :(得分:0)
row
代码中的以下行应该给你答案。
assert len(row)
如果您想获得更具描述性的错误跟踪,请使用以下命令:
msg = "row must have 4 elements and has {rowlen}. row: {row}"
assert len(row) == 4, msg.format(rowlen=len(row), row=row)
此断言应位于循环内部,并且在您尝试从行中访问项目之前。