我使用了关键字提取器,并获得了如下列表-
[('solutions design team', 0.5027793039863974),
('communication skills', 0.039048703166463736),
('internal stakeholders', 0.03230578820017667),
('potential customers', 0.020380881551651655), ('utilize', 0.002776174060064261)]
我试图将每个单词分开,并分别为每个单词分配相应的值(在右侧给出)。
例如-将“解决方案设计团队” = 0.5027793039863974放入
'solutions' = 0.5027793039863974,
'design' = 0.5027793039863974 ,
'team' = 0.5027793039863974.
答案 0 :(得分:1)
使用双重扁平理解来重新创建带有分隔词的元组列表的方法:
inlist = [('solutions design team', 0.5027793039863974),
('communication skills', 0.039048703166463736),
('internal stakeholders', 0.03230578820017667),
('potential customers', 0.020380881551651655), ('utilize', 0.002776174060064261)]
outlist = [(word,value) for words,value in inlist for word in words.split()]
结果:
>>> outlist
[('solutions', 0.5027793039863974),
('design', 0.5027793039863974),
('team', 0.5027793039863974),
('communication', 0.039048703166463736),
('skills', 0.039048703166463736),
('internal', 0.03230578820017667),
('stakeholders', 0.03230578820017667),
('potential', 0.020380881551651655),
('customers', 0.020380881551651655),
('utilize', 0.002776174060064261)]
请注意,如果关键字不止一次出现,则在元组列表中将有重复项。如果要累积它们,可以使用collections.defaultdict(float)
对象方便地创建带有关键字=>累积值的字典。
accumulated = collections.defaultdict(float)
for word,value in outlist:
accumulated[word] += value
答案 1 :(得分:0)
词典理解,词典更新可能会以下面的方式帮助您
corpus = [('solutions design team', 0.5027793039863974), ('communication skills', 0.039048703166463736), ('internal stakeholders', 0.03230578820017667), ('potential customers', 0.020380881551651655), ('utilize', 0.002776174060064261)]
final_dict = {}
for phase, prob in corpus:
final_dict.update({word:prob for word in phase.split()}
print(final_dict['solutions'])
print(final_dict['design'])