字符串索引超出范围的错误name_tools

时间:2014-07-20 05:31:31

标签: python csv

我有一个奇怪的python脚本:

#!/usr/local/bin/python2.7
import csv
import sys
import name_tools as nt
f = open(sys.argv[1], 'r')
try:
    reader = csv.reader(f)
    for row in reader:
        row = [author.strip() for author in row]
        print row[0], row[1], row[2], row[3], len(row)
        if row[1] and row[2]:
            print len(row),  row[0], len(row), nt.match(row[1], row[2])
finally:
    f.close()

输入文件如下:

" ROW_NUMBER "," AUTHOR_3PCATALOG "," AUTHOR_AMAZON "," LABEL "
" 1 ","  "," Tavoillot, Pierre-Henri "," NEGATIVE "
" 2 ","  "," di Bonaventura, Allegra "," NEGATIVE "
" 3 "," Stephen Jones "," Jones, Stephen "," NEGATIVE "
" 4 ","  "," Weisenberger, Dr. Gary J. "," NEGATIVE "
" 5 "," Cassandra Austin "," Irwin "," NEGATIVE "
" 6 ","  "," Cain, Paul "," NEGATIVE "
" 473 "," Craig  Harline ","  William Craig     "," NEGATIVE "

以某种方式,行号为473的记录会产生以下异常:

Traceback (most recent call last):
  File "./author_preprocessing.py", line 14, in <module>
    print len(row),  row[0], len(row), nt.match(row[1], row[2])
  File "build/bdist.linux-x86_64/egg/name_tools/match.py", line 102, in match
  File "build/bdist.linux-x86_64/egg/name_tools/match.py", line 19, in middle_initials
IndexError: string index out of range

该计划的输出是:

ROW_NUMBER AUTHOR_3PCATALOG AUTHOR_AMAZON LABEL 4
4 ROW_NUMBER 4 0.0
1  Tavoillot, Pierre-Henri NEGATIVE 4
2  di Bonaventura, Allegra NEGATIVE 4
3 Stephen Jones Jones, Stephen NEGATIVE 4
4 3 4 0.98
4  Weisenberger, Dr. Gary J. NEGATIVE 4
5 Cassandra Austin Irwin NEGATIVE 4
4 5 4 0.0
6  Cain, Paul NEGATIVE 4
473 Craig  Harline William Craig NEGATIVE 4
4 473 4

2 个答案:

答案 0 :(得分:0)

name_tools似乎拆分了内部传递的字符串参数,并且不处理字符之间有多个空格。因此,在特定情况下,Craig HarlineCraigHarline之间有两个空格。 split()join()为我做了。

答案 1 :(得分:0)

断言row

的长度

代码中的以下行应该给你答案。

assert len(row)

如果您想获得更具描述性的错误跟踪,请使用以下命令:

msg = "row must have 4 elements and has {rowlen}. row: {row}"
assert len(row) == 4, msg.format(rowlen=len(row), row=row)

此断言应位于循环内部,并且在您尝试从行中访问项目之前。