我有一个函数可以在事务列表中返回给定项目集的"support"项目列表在下面的行中显示的频率: def count(pair_list):
def support_tuple(items):
count = float(sum([1 for row in rows_tuple if (items in row)]))
supp = count/n_rows
return (items, supp)
if __name__ == "__main__":
from multiprocessing.dummy import Pool as ThreadPool
import multiprocessing as mp
pairs = [('apple', 'banana'), ('cookie', 'popsicle'), ('candy', 'cookie'), ...]
# grocery transaction data
rows_tuple = [{('margarine', 'margarine'), ('citrus', 'semi-finished'), ('bread', 'bread'), ('citrus', 'citrus')}, {('bread', 'fruit'), ('citrus', 'margarine'), ('ready', 'bread'), ('semi-finished', 'fruit'), ('soups', 'margarine'), ('margarine', 'soups')}, {('fruit', 'margarine'), ... }]
res_list_comprehension = [support_tuple(pair) for pair in pairs]
threadpool = ThreadPool(mp.cpu_count())
res_threading = threadpool.map(support_tuple, pairs, chunksize = 100)
实际上,rows_tuple
的长度为18000,pairs
的长度为9000,但我的主要问题是为什么列表理解在这种情况下优于线程?我是否完全错过了可以大大提高速度的线程?