使用本地字典进行Python多线程处理以加快进程

时间:2018-05-01 02:48:32

标签: python multithreading

我的进程非常慢,我希望使用多线程加速它。我的过程的目标是读取一个非常大的数据集并对每一行进行昂贵的计算,然后将结果存储在字典中。我想使用多线程,但我不知道如何。这是我的尝试。

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from multiprocessing.pool import ThreadPool

def findTweets(side):
     cosine_dict = {}
     for t in tweets:
            topic = [side, t]
            tfidf_vectorizer = TfidfVectorizer()
            topic_matrix = tfidf_vectorizer.fit_transform(topic)
            cosine = cosine_similarity(topic_matrix[0:1], topic_matrix[1:2])
            cosine = float(cosine)
            key = side + "&&" + t
            cosine_dict[key] = cosine
     return cosine_dict

left = [] #just some strings

for l in left:
       pool = ThreadPool(processes = 10)
       result = pool.apply_async(findTweets, (l,))
       cosine_dict_left = result.get()

这似乎并没有加快性能。我如何在这里应用多线程来加速这个过程?

1 个答案:

答案 0 :(得分:0)

result.get()是一个阻止通话。所以你一次只能运行一个任务。一个肮脏的修复方法是:

left = [] #just some strings
results=[]
pool = ThreadPool(processes = 10)

for l in left:
       results.append(pool.apply_async(findTweets, (l,)))

for result in results:
       cosine_dict_left = result.get()
       #Do something with cosine_dict_left