我有以下程序处理文件(大约3400,具体取决于当天的时间)。然而,它似乎遗漏了一些,即,即使我喂它~3400个文件,它只会处理~3100,例如。这是代码:
import multiprocessing
from multiprocessing import Pool
def split_list(L, n):
return [L[i::n] for i in xrange(n)]
def coreFunc(myarg):
listlen = len(myarg)
print "listlen = ", listlen
for listiter in range(listlen):
input1 = (myarg[listiter]).rstrip('\n')
print "input1 = ", input1
return 1
if __name__=="__main__":
fptr = open("myfilelist")
array = fptr.readlines()
numC = multiprocessing.cpu_count()
lists = split_list(array, numC)
p = Pool(numC)
p.map(coreFunc, lists)
p.close()
p.join()
" myfilelist"是一个文本文件,其中包含那些~3400文件的文件名,如下所示:
/home/user/file1
/home/user/file2
/home/user/file3
….
每次运行程序时都会遗漏大约300个文件。遗漏的文件并不总是一样的。每次运行都会有所不同。
知道为什么这些文件被遗漏了吗?我通过使用一组不同的文件,通过重新排列" filelist"等中的文件名来验证它与字段本身无关,但似乎没有任何效果。也没有错误消息。
感谢。
答案 0 :(得分:0)
我已经制作了可直接运行的代码版本。此修订后的代码还提供了特定于流程的日志记录,这有助于查看正在进行的操作。
希望这有帮助!
import logging, multiprocessing
from multiprocessing import Pool
def split_list(L, n):
return [L[i::n] for i in xrange(n)]
def coreFunc(mylist):
proclog = multiprocessing.get_logger()
proclog.info("listlen = %d", len(mylist))
for path in mylist:
proclog.info("input1 = %s", path)
return 1
if __name__=="__main__":
if 0:
array = [line.rstrip() for line in open("myfilelist")]
else:
import string
array = string.uppercase
mylog = multiprocessing.log_to_stderr()
mylog.setLevel(logging.INFO)
numC = multiprocessing.cpu_count()
lists = split_list(array, numC)
p = Pool(numC)
print p.map(coreFunc, lists)
p.close()
p.join()
[INFO/PoolWorker-1] child process calling self.run()
[INFO/PoolWorker-2] child process calling self.run()
[INFO/PoolWorker-4] child process calling self.run()
[INFO/PoolWorker-1] listlen = 7
[INFO/PoolWorker-1] input1 = A
[INFO/PoolWorker-1] input1 = E
[INFO/PoolWorker-1] input1 = I
[INFO/PoolWorker-1] input1 = M
[INFO/PoolWorker-1] input1 = Q
[INFO/PoolWorker-3] child process calling self.run()
[INFO/PoolWorker-1] input1 = U
[INFO/PoolWorker-1] input1 = Y
[INFO/PoolWorker-1] listlen = 6
[INFO/PoolWorker-1] input1 = D
[INFO/PoolWorker-4] listlen = 6
[INFO/PoolWorker-1] input1 = H
[INFO/PoolWorker-4] input1 = C
[INFO/PoolWorker-4] input1 = G
[INFO/PoolWorker-1] input1 = L
[INFO/PoolWorker-3] listlen = 7
[INFO/PoolWorker-1] input1 = P
[INFO/PoolWorker-4] input1 = K
[INFO/PoolWorker-3] input1 = B
[INFO/PoolWorker-4] input1 = O
[INFO/PoolWorker-1] input1 = T
[INFO/PoolWorker-1] input1 = X
[INFO/PoolWorker-4] input1 = S
[INFO/PoolWorker-3] input1 = F
[INFO/PoolWorker-4] input1 = W
[INFO/PoolWorker-3] input1 = J
[INFO/PoolWorker-3] input1 = N
[INFO/PoolWorker-3] input1 = R
[INFO/PoolWorker-3] input1 = V
[INFO/PoolWorker-3] input1 = Z
[INFO/PoolWorker-1] process shutting down
[INFO/PoolWorker-2] process shutting down
[INFO/PoolWorker-2] process exiting with exitcode 0
[INFO/PoolWorker-1] process exiting with exitcode 0
[INFO/PoolWorker-3] process shutting down
[INFO/PoolWorker-4] process shutting down
[INFO/PoolWorker-3] process exiting with exitcode 0
[INFO/PoolWorker-4] process exiting with exitcode 0
[INFO/MainProcess] process shutting down
[1, 1, 1, 1]