Question

我遇到了一个我无法弄清楚的map_async的有趣问题。

我正在使用带有进程池的python多处理库。我正在尝试传递要比较的字符串列表以及要与使用map_async（）的函数进行比较的字符串列表

现在我有：

from multiprocessing import Pool, cpu_count
import functools

dictionary = /a/file/on/my/disk
passin = /another/file/on/my/disk

num_proc = cpu_count()

dictionary = readFiletoList(fdict)
dictionary = sortByLength(dictionary)

words = readFiletoList(passin, 'WINDOWS-1252')
words = sortByLength(words)

result = pool.map_async(functools.partial(mpmine, dictionary=dictionary), [words], 1000)

def readFiletoList(fname, fencode='utf-8'):
  linelist = list()
  with open(fname, encoding=fencode) as f:
    for line in f:
      linelist.append(line.strip())
  return linelist


def sortByLength(words):
  '''Takes an ordered iterable and sorts it based on word length'''
  return sorted(words, key=len)

def mpmine(word, dictionary):
  '''Takes a tuple of length 2 with it's arguments.

  At least dictionary needs to be sorted by word length. If not, whacky results ensue.
  '''
  results = dict()
  for pw in word:
    pwlen = len(pw)
    pwres = list()
    for word in dictionary:
      if len(word) > pwlen:
        break
      if word in pw:
        pwres.append(word)
    if len(pwres) > 0:
      results[pw] = pwres
  return results



if __name__ == '__main__':
  main()

字典和单词都是字符串列表。这导致只使用一个进程而不是我设置的数量。如果我从变量'words'中取出方括号，它似乎依次遍历每个字符串的字符并导致混乱。

我想要发生的事情是，它需要1000个字符串，并将它们传递给工作进程然后得到结果，因为这是一个荒谬的并行任务。

编辑：添加了更多代码，以使更清晰的事情发生。

Answer 1

好吧，我实际上自己想出了这个。我只会在这里发布答案给其他可能出现并且有同样问题的人。我遇到问题的原因是因为map_async从列表中获取一个项目（在本例中是一个字符串），并将其提供给函数，该函数期望一个字符串列表。所以它然后基本上将每个字符串视为字符列表。 mpmine的更正代码是：

def mpmine(word, dictionary):
  '''Takes a tuple of length 2 with it's arguments.

  At least dictionary needs to be sorted by word length. If not, whacky results ensue.
  '''
  results = dict()
  pw = word
  pwlen = len(pw)
  pwres = list()
  for word in dictionary:
    if len(word) > pwlen:
      break
    if word in pw:
      pwres.append(word)
  if len(pwres) > 0:
    results[pw] = pwres
  return results

我希望这可以帮助其他任何面临类似问题的人。

将字符串列表传递给map_async（）

1 个答案: