列表索引超出贷款数据的范围错误

时间:2016-10-27 17:14:13

标签: python python-2.7

我正在尝试重新创建此分析:https://rstudio-pubs-static.s3.amazonaws.com/203258_d20c1a34bc094151a0a1e4f4180c5f6f.html

我无法让shell脚本在我的计算机上运行,​​所以我创建了一个基本上就是这样的代码:

import sys 

input_file = sys.argv[1]
output_file = sys.argv[2]

in_fp = open(input_file,"r")
out_fp = open(output_file,"w")

count = 0 

for line in in_fp:
     if count == 1:  
         out_fp.write(line+"\n")
     elif count>1:
         elems = line.split(",")
         loan = elems[16].upper()
         if loan == "FULLY PAID" or loan == "LATE (31-120 DAYS)" or loan == "DEFAULT" or loan == "CHARGED OFF":
             out_fp.write(line+"\n")
     count+=1
in_fp.close()
out_fp.close()

虽然此代码适用于2015年的数据,但当我运行2012-2013数据时,我收到错误消息:

File "ShellScript.py", line 16, in <module>
    loan = elems[16].upper()
IndexError: list index out of range

有人可以告诉我如何修复此错误以获取数据排序?谢谢

1 个答案:

答案 0 :(得分:0)

你的一行没有17个元素,因此elems[16]失败。这通常是由数据中的空行引起的。它也可能是由带有嵌入换行符的带引号的字段引起的。如果它是带有嵌入换行符的带引号的字段,则需要使用csv模块。

这是使用csv模块的重写。它报告并跳过短线。我把它变成了Pythonic。

import sys
import csv

input_file = sys.argv[1]
output_file = sys.argv[2]
ncolumns = 17 # IS THIS RIGHT?
keep_loans = {"FULLY PAID", "LATE (31-120 DAYS)", "DEFAULT", "CHARGED OFF"}

# with statment automatically closes files after block
with open(input_file, "rb") as in_fp, open(output_file, "wb") as out_fp:
    reader = csv.reader(in_fp)
    writer = csv.writer(out_fp)
    # you are currently skipping line 0
    next(reader)
    # copy headers
    writer.writerow(next(reader))
    # you are currently adding an extra newline to headers
    # writer.writerow([]) # uncomment if you want that extra newline

    for row_num, row in enumerate(reader, start=2):
        if len(row) < ncolumns:
           # report and skip short rows
           print "row %s shorter than expected. skipping row.  row: %s" % (row_num, row)
           continue
        # use `in` rather than multiple == statements
        if row[16].upper() in keep_loans
             writer.writerow(row)