编辑“请仅关注以下示例的答案,没有广泛的情景”
确定。我读过关于词云的文章。但我想知道如何在字符串变量中一起表示最常出现的单词,如下例所示?:
Var_x
wireless problems, migration to competitor
dissatisfied customers, technicians visits scheduled
call waiting, technicians visits
bad customer experience, wireless problems
所以我想要的是:(“无线问题”和“技术人员访问”)在云中的表示。怎么办呢?
答案 0 :(得分:3)
此代码生成可用作基础词云数据的相邻单词的频率分布:
from nltk import bigrams, FreqDist
from nltk.tokenize import RegexpTokenizer
from operator import itemgetter
sent = 'wireless problems, migration to competitor\n\
dissatisfied customers, technicians visits scheduled\n\
call waiting, technicians visits\n\
bad customer experience, wireless problems'
tokenizer = RegexpTokenizer(r'\w+')
sent_words = tokenizer.tokenize(sent)
freq_dist = FreqDist(bigrams(sent_words))
for k,v in sorted(freq_dist.items(), key=itemgetter(1), reverse=True):
print(k,v)
<强>输出强>
('technicians', 'visits') 2
('wireless', 'problems') 2
('dissatisfied', 'customers') 1
('bad', 'customer') 1
('scheduled', 'call') 1
('competitor', 'dissatisfied') 1
('migration', 'to') 1
('to', 'competitor') 1
('visits', 'scheduled') 1
('call', 'waiting') 1
('problems', 'migration') 1
('waiting', 'technicians') 1
('customers', 'technicians') 1
('customer', 'experience') 1
('experience', 'wireless') 1
('visits', 'bad') 1