使用Python多处理库的奇怪行为

时间:2014-04-13 21:19:57

标签: python list file-io multiprocessing pycharm

我正在尝试使用python multiprocessing库读取文件,但没有获得所需的结果。这是我正在使用的代码:

import multiprocessing as mp
import itertools

partitioned = {}
partitioned['0-20'] = []
partitioned['20-40'] = []
partitioned['40-60'] = []
partitioned['60+'] = []
output = []

def map_func1(f):
    # for line in f:
    gen = f[14:15] #15 1=male 2=female
    age = f[17:19] #18-19
    htin = f[1947:1950] #1948-1950 tall in inches, self reported !888! !999!
    wtlbs = f[1950:1953] #1951-1953 wt in lbs, self reported !888! !999!
    ovwt = f[1963:1964] #1964 consider myself overweight 1,under 2,over 3, !8!, !9!
    chwt = f[1964:1965] #1965 change weight or stay same 1=more, 2=less, 3=same, !8!, !9!
    output.append([gen, age, htin, wtlbs, ovwt, chwt])
    return output

def partitioner(m):
    for element in m:
        if int(element[1]) < 20:
            output['0-20'].append(element)
        elif int(element[1]) < 40:
            output['20-40'].append(element)
        elif int(element[1]) < 60:
            output['40-60'].append(element)
        else:
            output['60+'].append(element)

    return partitioned

if __name__ == "__main__":
    pool = mp.Pool(processes=3)
    f = open('adult.dat')
    m = pool.map(map_func1, f)
    print len(output)
    print len(m)
    p = partitioner(m)
    print p

这是我收到的输出:

TypeError: int() argument must be a string or a number, not 'list'
0
20050

我有以下问题:

  1. 我不明白为什么在上述代码中,output的长度为0,变量m的长度为20050.据我所知,{{1 }和output,长度应为20050.

  2. 为什么m在这种情况下?为什么参数不能成为TypeError()函数中的列表?

  3. 当我尝试在调试窗口中看到变量partitioner的内容时,我的系统几乎崩溃了。 (我正在使用Ubuntu 13.10并在其上运行Pycharm 3.1!)如果我试图查看的列表内容非常庞大,我可以理解这一点,在这种情况下它们不是。它是20050个列表的列表,每个列表包含6个元素。

  4. 在这方面的任何帮助将受到高度赞赏。

2 个答案:

答案 0 :(得分:0)

只是为了解决您的错误,partitioner来电:

int(element[1])

但是,根据map_func1element1定义为age

age = f[17:19] #18-19

这是一个两项列表切片,它本身就是一个列表,因此不是int的有效参数。

对于其他人,我建议您输出样本以查看其中的内容,例如

print m[:5]

答案 1 :(得分:0)

问题是我没有从mapper函数正确返回内容。代码的轻微更改使其按要求工作:

import multiprocessing as mp
import itertools

partitioned = {}
partitioned['0-20'] = []
partitioned['20-40'] = []
partitioned['40-60'] = []
partitioned['60+'] = []

def map_func1(f):
    # for line in f:
    gen = f[14:15] #15 1=male 2=female
    age = f[17:19] #18-19
    htin = f[1947:1950] #1948-1950 tall in inches, self reported !888! !999!
    wtlbs = f[1950:1953] #1951-1953 wt in lbs, self reported !888! !999!
    ovwt = f[1963:1964] #1964 consider myself overweight 1,under 2,over 3, !8!, !9!
    chwt = f[1964:1965] #1965 change weight or stay same 1=more, 2=less, 3=same, !8!, !9!
    return [gen, age, htin, wtlbs, ovwt, chwt]

def partitioner(m):
    for element in m:
        if int(element[1]) < 20:
            partitioned['0-20'].append(element)
        elif int(element[1]) < 40:
            partitioned['20-40'].append(element)
        elif int(element[1]) < 60:
            partitioned['40-60'].append(element)
        else:
            partitioned['60+'].append(element)

    return partitioned

if __name__ == "__main__":
    pool = mp.Pool(processes=3)
    f = open('adult.dat')
    m = pool.map(map_func1, f)
    print m[0]
    p = partitioner(m)
    print len(p['60+'])