Question

我正在尝试重新创建此分析：https://rstudio-pubs-static.s3.amazonaws.com/203258_d20c1a34bc094151a0a1e4f4180c5f6f.html

我无法让shell脚本在我的计算机上运行，所以我创建了一个基本上就是这样的代码：

import sys 

input_file = sys.argv[1]
output_file = sys.argv[2]

in_fp = open(input_file,"r")
out_fp = open(output_file,"w")

count = 0 

for line in in_fp:
     if count == 1:  
         out_fp.write(line+"\n")
     elif count>1:
         elems = line.split(",")
         loan = elems[16].upper()
         if loan == "FULLY PAID" or loan == "LATE (31-120 DAYS)" or loan == "DEFAULT" or loan == "CHARGED OFF":
             out_fp.write(line+"\n")
     count+=1
in_fp.close()
out_fp.close()

虽然此代码适用于2015年的数据，但当我运行2012-2013数据时，我收到错误消息：

File "ShellScript.py", line 16, in <module>
    loan = elems[16].upper()
IndexError: list index out of range

有人可以告诉我如何修复此错误以获取数据排序？谢谢

Answer 1

你的一行没有17个元素，因此elems[16]失败。这通常是由数据中的空行引起的。它也可能是由带有嵌入换行符的带引号的字段引起的。如果它是带有嵌入换行符的带引号的字段，则需要使用csv模块。

这是使用csv模块的重写。它报告并跳过短线。我把它变成了Pythonic。

import sys
import csv

input_file = sys.argv[1]
output_file = sys.argv[2]
ncolumns = 17 # IS THIS RIGHT?
keep_loans = {"FULLY PAID", "LATE (31-120 DAYS)", "DEFAULT", "CHARGED OFF"}

# with statment automatically closes files after block
with open(input_file, "rb") as in_fp, open(output_file, "wb") as out_fp:
    reader = csv.reader(in_fp)
    writer = csv.writer(out_fp)
    # you are currently skipping line 0
    next(reader)
    # copy headers
    writer.writerow(next(reader))
    # you are currently adding an extra newline to headers
    # writer.writerow([]) # uncomment if you want that extra newline

    for row_num, row in enumerate(reader, start=2):
        if len(row) < ncolumns:
           # report and skip short rows
           print "row %s shorter than expected. skipping row.  row: %s" % (row_num, row)
           continue
        # use `in` rather than multiple == statements
        if row[16].upper() in keep_loans
             writer.writerow(row)

列表索引超出贷款数据的范围错误

1 个答案: