Python多处理不处理列表中的所有项目

时间:2014-07-23 00:45:29

标签: python multiprocessing

我有以下程序处理文件(大约3400,具体取决于当天的时间)。然而,它似乎遗漏了一些,即,即使我喂它~3400个文件,它只会处理~3100,例如。这是代码:

import multiprocessing
from multiprocessing import Pool

def split_list(L, n):
    return [L[i::n] for i in xrange(n)]

def coreFunc(myarg):

    listlen = len(myarg)
    print "listlen = ", listlen

    for listiter in range(listlen):
        input1 = (myarg[listiter]).rstrip('\n')
        print "input1 = ", input1

    return 1

if __name__=="__main__":

    fptr = open("myfilelist")
    array = fptr.readlines()

    numC = multiprocessing.cpu_count()
    lists = split_list(array, numC)
    p = Pool(numC)
    p.map(coreFunc, lists)

    p.close()
    p.join()

" myfilelist"是一个文本文件,其中包含那些~3400文件的文件名,如下所示:

    /home/user/file1
    /home/user/file2
    /home/user/file3
    ….

每次运行程序时都会遗漏大约300个文件。遗漏的文件并不总是一样的。每次运行都会有所不同。

知道为什么这些文件被遗漏了吗?我通过使用一组不同的文件,通过重新排列" filelist"等中的文件名来验证它与字段本身无关,但似乎没有任何效果。也没有错误消息。

感谢。

1 个答案:

答案 0 :(得分:0)

我已经制作了可直接运行的代码版本。此修订后的代码还提供了特定于流程的日志记录,这有助于查看正在进行的操作。

希望这有帮助!

import logging, multiprocessing
from multiprocessing import Pool

def split_list(L, n):
    return [L[i::n] for i in xrange(n)]

def coreFunc(mylist):
    proclog = multiprocessing.get_logger()

    proclog.info("listlen = %d", len(mylist))
    for path in mylist:
        proclog.info("input1 = %s", path)

    return 1


if __name__=="__main__":

    if 0:
        array = [line.rstrip() for line in open("myfilelist")]
    else:
        import string
        array = string.uppercase

    mylog = multiprocessing.log_to_stderr()
    mylog.setLevel(logging.INFO)

    numC = multiprocessing.cpu_count()
    lists = split_list(array, numC)

    p = Pool(numC)
    print p.map(coreFunc, lists)
    p.close()
    p.join()

输出

[INFO/PoolWorker-1] child process calling self.run()
[INFO/PoolWorker-2] child process calling self.run()
[INFO/PoolWorker-4] child process calling self.run()
[INFO/PoolWorker-1] listlen = 7
[INFO/PoolWorker-1] input1 = A
[INFO/PoolWorker-1] input1 = E
[INFO/PoolWorker-1] input1 = I
[INFO/PoolWorker-1] input1 = M
[INFO/PoolWorker-1] input1 = Q
[INFO/PoolWorker-3] child process calling self.run()
[INFO/PoolWorker-1] input1 = U
[INFO/PoolWorker-1] input1 = Y
[INFO/PoolWorker-1] listlen = 6
[INFO/PoolWorker-1] input1 = D
[INFO/PoolWorker-4] listlen = 6
[INFO/PoolWorker-1] input1 = H
[INFO/PoolWorker-4] input1 = C
[INFO/PoolWorker-4] input1 = G
[INFO/PoolWorker-1] input1 = L
[INFO/PoolWorker-3] listlen = 7
[INFO/PoolWorker-1] input1 = P
[INFO/PoolWorker-4] input1 = K
[INFO/PoolWorker-3] input1 = B
[INFO/PoolWorker-4] input1 = O
[INFO/PoolWorker-1] input1 = T
[INFO/PoolWorker-1] input1 = X
[INFO/PoolWorker-4] input1 = S
[INFO/PoolWorker-3] input1 = F
[INFO/PoolWorker-4] input1 = W
[INFO/PoolWorker-3] input1 = J
[INFO/PoolWorker-3] input1 = N
[INFO/PoolWorker-3] input1 = R
[INFO/PoolWorker-3] input1 = V
[INFO/PoolWorker-3] input1 = Z
[INFO/PoolWorker-1] process shutting down
[INFO/PoolWorker-2] process shutting down
[INFO/PoolWorker-2] process exiting with exitcode 0
[INFO/PoolWorker-1] process exiting with exitcode 0
[INFO/PoolWorker-3] process shutting down
[INFO/PoolWorker-4] process shutting down
[INFO/PoolWorker-3] process exiting with exitcode 0
[INFO/PoolWorker-4] process exiting with exitcode 0
[INFO/MainProcess] process shutting down
[1, 1, 1, 1]