地图缩减 - 列表索引超出地图映射器中的范围

时间:2017-12-07 22:20:05

标签: python mapreduce mapper

我有一张地图mapper,代码如下

#!/usr/bin/env python
import sys

myList = []
n = 10  # Number of top N records

for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()
    # split data values into list
    data = line.split(";")

    # convert weight (currently a string) to int
    try:
        balance = int(data[6])
    except ValueError:
        # ignore/discard this line
        continue

    # add (weight, record) touple to list
    myList.append( (balance, line) )
    # sort list in reverse order
    myList.sort(reverse=True)

    # keep only first N records
    if len(myList) > n:
        myList = myList[:n]

# Print top N records
for (k,v) in myList:
    print(v)

它在第20行产生了这个错误:

balance = int(data[6])

IndexError: list index out of range

该过程试图找到一个不存在的管道。

以下是数据集的示例:

age job marital education   default balance housing loan    contact day month   duration    campaign    pdays   previous    poutcome    y
30  unemployed  married primary no  1787    no  no  cellular    19  oct 79  1   -1  0   unknown no
33  services    married secondary   no  4789    yes yes cellular    11  may 220 1   339 4   failure no

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

现在有几个问题。您的示例数据似乎是制表符分隔的,但是您要分开“;”尝试“\ t”而不是。第6个字段也不平衡,它是住房,使用字段5。

如果您要执行许多这样的任务,请查看python内置的csv模块。