我有一张地图mapper,代码如下
#!/usr/bin/env python
import sys
myList = []
n = 10 # Number of top N records
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# split data values into list
data = line.split(";")
# convert weight (currently a string) to int
try:
balance = int(data[6])
except ValueError:
# ignore/discard this line
continue
# add (weight, record) touple to list
myList.append( (balance, line) )
# sort list in reverse order
myList.sort(reverse=True)
# keep only first N records
if len(myList) > n:
myList = myList[:n]
# Print top N records
for (k,v) in myList:
print(v)
它在第20行产生了这个错误:
balance = int(data[6])
IndexError: list index out of range
该过程试图找到一个不存在的管道。
以下是数据集的示例:
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome y
30 unemployed married primary no 1787 no no cellular 19 oct 79 1 -1 0 unknown no
33 services married secondary no 4789 yes yes cellular 11 may 220 1 339 4 failure no
有什么想法吗?
答案 0 :(得分:0)
现在有几个问题。您的示例数据似乎是制表符分隔的,但是您要分开“;”尝试“\ t”而不是。第6个字段也不平衡,它是住房,使用字段5。
如果您要执行许多这样的任务,请查看python内置的csv模块。