Python多处理“错误的文件描述符”错误(不可重复)

时间:2014-05-20 19:51:31

标签: python python-2.7 file-io multiprocessing

提前道歉,但我无法发布一个完整的示例(此代码中的开销过多,无法提取到可运行的代码段)。我将尽可能多地发布解释性详细信息,如果有任何重要的内容遗漏,请告诉我。

通过IDLE运行Python 2.7.5

我正在编写一个程序来比较两个文本文件。由于文件可能很大(~500MB)并且每行比较是独立的,我想实现multiprocessing以加快比较。这工作得很好,但我遇到了伪随机Bad file descriptor错误。我是多处理新手,所以我想我的实现存在技术问题。有人能指出我正确的方向吗?

以下是导致问题的代码(特别是pool.map):

   # openfiles
   csvReaderTest = csv.reader(open(testpath, 'r'))
   csvReaderProd = csv.reader(open(prodpath, 'r'))    
   compwriter = csv.writer(open(outpath, 'wb'))

   pool = Pool()
   num_chunks = 3

   chunksTest = itertools.groupby(csvReaderTest, keyfunc)
   chunksProd = itertools.groupby(csvReaderProd, keyfunc)
   while True:
        # make a list of num_chunks chunks
        groupsTest = [list(chunk) for key, chunk in itertools.islice(chunksTest, num_chunks)]
        groupsProd = [list(chunk) for key, chunk in itertools.islice(chunksProd, num_chunks)]
        # merge the two lists (pair off comparison rows)
        groups_combined = zip(groupsTest,groupsProd)
        if groups_combined:
            # http://stackoverflow.com/questions/5442910/python-multiprocessing-pool-map-for-multiple-arguments
            a_args = groups_combined # a list - set of combinations to be tested
            second_arg = True
            worker_result = pool.map(worker_mini_star, itertools.izip(itertools.repeat(second_arg),a_args))

这是完整的错误输出。 (有时会发生此错误,有时比较运行完成没有问题):

Traceback (most recent call last):
  File "H:/<PATH_SNIP>/python_csv_compare_multiprocessing_rev02_test2.py", line 407, in <module>
    main(fileTest, fileProd, fileout, stringFields, checkFileLengths)
  File "H:/<PATH_SNIP>/python_csv_compare_multiprocessing_rev02_test2.py", line 306, in main
    worker_result = pool.map(worker_mini_star, itertools.izip(itertools.repeat(second_arg),a_args))
  File "C:\Python27\lib\multiprocessing\pool.py", line 250, in map
    return self.map_async(func, iterable, chunksize).get()
  File "C:\Python27\lib\multiprocessing\pool.py", line 554, in get
    raise self._value
IOError: [Errno 9] Bad file descriptor

如果有帮助,这里是pool.map调用的函数:

   def worker_mini(flag, chunk):
       row_comp = []
       for entry, entry2 in zip(chunk[0][0], chunk[1][0]):
           if entry == entry2:
               temp_comp = entry
           else:
               temp_comp = '%s|%s' % (entry, entry2)
           row_comp.append(temp_comp)
       return True, row_comp

   #takes a single tuple argument and unpacks the tuple to multiple arguments
   def worker_mini_star(flag_chunk):
       """Convert `f([1,2])` to `f(1,2)` call."""
       return worker_mini(*flag_chunk)

   def main():

0 个答案:

没有答案