Question

我有一个没有副作用的功能。我想为数组中的每个元素运行它，并返回一个包含所有结果的数组。

Python有什么东西可以生成所有值吗？

Answer 1

尝试多处理中的Pool.map函数：

http://docs.python.org/library/multiprocessing.html#using-a-pool-of-workers

这本身并不是多线程的，但实际上这很好，因为GIL会在Python中严重削弱多线程。

Answer 2

您可以使用多处理python包（http://docs.python.org/library/multiprocessing.html）。来自PiCloud（http://www.picloud.com）的云python包也提供了一个多处理map（）函数，可以将你的地图卸载到云端。

Answer 3

Python现在有了concurrent.futures模块，这是让map可以使用多个线程或多个进程的最简单方法。

https://docs.python.org/3/library/concurrent.futures.html

Answer 4

以下是我的map_parallel功能。它的工作方式与map类似，只不过它可以在一个单独的线程中并行运行每个元素（但请参见下面的注释）。这个答案建立在another SO answer之上。

import threading
import logging
def map_parallel(f, iter, max_parallel = 10):
    """Just like map(f, iter) but each is done in a separate thread."""
    # Put all of the items in the queue, keep track of order.
    from queue import Queue, Empty
    total_items = 0
    queue = Queue()
    for i, arg in enumerate(iter):
        queue.put((i, arg))
        total_items += 1
    # No point in creating more thread objects than necessary.
    if max_parallel > total_items:
        max_parallel = total_items

    # The worker thread.
    res = {}
    errors = {}
    class Worker(threading.Thread):
        def run(self):
            while not errors:
                try:
                    num, arg = queue.get(block = False)
                    try:
                        res[num] = f(arg)
                    except Exception as e:
                        errors[num] = sys.exc_info()
                except Empty:
                    break

    # Create the threads.
    threads = [Worker() for _ in range(max_parallel)]
    # Start the threads.
    [t.start() for t in threads]
    # Wait for the threads to finish.
    [t.join() for t in threads]

    if errors:
        if len(errors) > 1:
            logging.warning("map_parallel multiple errors: %d:\n%s"%(
                len(errors), errors))
        # Just raise the first one.
        item_i = min(errors.keys())
        type, value, tb = errors[item_i]
        # Print the original traceback
        logging.info("map_parallel exception on item %s/%s:\n%s"%(
            item_i, total_items, "\n".join(traceback.format_tb(tb))))
        raise value
    return [res[i] for i in range(len(res))]

注意：有一点需要注意的是例外情况。与普通map一样，如果其中一个子线程引发异常，则上述函数会引发异常，并将停止迭代。但是，由于平行性，不能保证最早的元素会引发第一个例外。

Answer 5

也许尝试Unladen Swallow Python 3实施？这可能是一个重大项目，并不能保证稳定，但如果你倾向于它可以工作。然后list or set comprehensions似乎是正确使用的功能结构。

Answer 6

在Python标准库（3.2版中的新功能）中尝试concurrent.futures.ThreadPoolExecutor.map。

类似于map(func, *iterables)，除了：


立即迭代而不是懒惰地收集可迭代对象；

func是异步执行的，并且可能会同时对func进行多次调用。

一个简单的示例（从ThreadPoolExecutor Example修改）：

import concurrent.futures
import urllib.request

URLS = [
  'http://www.foxnews.com/',
  'http://www.cnn.com/',
  'http://europe.wsj.com/',
  'http://www.bbc.co.uk/',
]

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    # Do something here
    # For example
    with urllib.request.urlopen(url, timeout=timeout) as conn:
      try:
        data = conn.read()
      except Exception as e:
        # You may need a better error handler.
        return b''
      else:
        return data

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
    # map
    l = list(executor.map(lambda url: load_url(url, 60), URLS))

print('Done.')

Answer 7

我认为没有理由拥有这样的功能。所有Python线程都必须在同一个CPU上执行。假设您的map函数没有I / O组件，您将看不到任何加速处理（并且由于上下文切换可能会看到速度减慢）。

其他海报提到了多处理 - 这可能是一个更好的主意。

Answer 8

此功能未内置。但是，someone has already implemented it。

是否有多线程map（）函数？

8 个答案: