我有一个程序找到单词对的Pointwise互信息。我所做的代码如下:
temp_list = co_occurrence_dict.values()
N=0
for item in temp_list:
N += item
main_dict = {}
temp_dict = {}
for word in all_opinion_words:
for key,value in co_occurrence_dict.iteritems():
if ((word == key.split()[0]) or (word == key.split()[1])):
pmi_eqn_numerator = (value)/float(N)
if (key.split()[0] != word):
temp_word = key.split()[0]
else:
temp_word = key.split()[1]
if temp_word==word:
temp_dict[temp_word] = 0
else:
temp_sum = generate_sum_occurrence_values(temp_word,co_occurrence_dict)
pmi_eqn_denominator1 = (temp_sum)/float(N)
temp_sum = generate_sum_occurrence_values(word, co_occurrence_dict)
pmi_eqn_denominator2 = (temp_sum)/float(N)
pmi = math.log(pmi_eqn_numerator / (float(pmi_eqn_denominator1) * float(pmi_eqn_denominator2)))
temp_dict[temp_word] = pmi
temp_list = []
for elements in temp_dict.iteritems():
temp_list.append(elements)
main_dict[word] = temp_list
def generate_sum_occurrence_values(item,temp_dict):
sum_values = 0
for key,value in temp_dict.iteritems():
if ((item == key.split()[0]) or (item == key.split()[1])):
sum_values += value
return sum_values
其中,all_opinion_words是单词列表,co_occurrence_dict的形式为
co_occurrence_dict={'social contemporary': 1,
'earthly indeed': 1,
'far mythical': 1,
'small higher': 1,
'ideological even': 1,
'certain `magnificent': 1,
'back al': 8,
'perhaps thin': 1,
'never skeptical': 1,
'federal small': 1,
'difficult most': 1,
'also young': 1,
'ideological ever': 1,
'far rather': 1,
'able happy': 1}
由于我必须处理大量文档,所以all_opinion_words和co_occurrence_dict
的大小非常大。因此需要很长的执行时间。我读过线程可以用来加速执行。如何在此代码中应用线程?