我怎样才能让词汇云在一起?

时间:2016-09-07 11:18:17

标签: python word-cloud

编辑“请仅关注以下示例的答案,没有广泛的情景”

确定。我读过关于词云的文章。但我想知道如何在字符串变量中一起表示最常出现的单词,如下例所示?:

Var_x
wireless problems, migration to competitor
dissatisfied customers, technicians visits scheduled
call waiting, technicians visits
bad customer experience, wireless problems

所以我想要的是:(“无线问题”和“技术人员访问”)在云中的表示。怎么办呢?

1 个答案:

答案 0 :(得分:3)

此代码生成可用作基础词云数据的相邻单词的频率分布:

from nltk import bigrams, FreqDist
from nltk.tokenize import RegexpTokenizer
from operator import itemgetter

sent = 'wireless problems, migration to competitor\n\
dissatisfied customers, technicians visits scheduled\n\
call waiting, technicians visits\n\
bad customer experience, wireless problems'

tokenizer = RegexpTokenizer(r'\w+')
sent_words = tokenizer.tokenize(sent)
freq_dist = FreqDist(bigrams(sent_words))

for k,v in sorted(freq_dist.items(), key=itemgetter(1), reverse=True):
    print(k,v)

<强>输出

('technicians', 'visits') 2
('wireless', 'problems') 2
('dissatisfied', 'customers') 1
('bad', 'customer') 1
('scheduled', 'call') 1
('competitor', 'dissatisfied') 1
('migration', 'to') 1
('to', 'competitor') 1
('visits', 'scheduled') 1
('call', 'waiting') 1
('problems', 'migration') 1
('waiting', 'technicians') 1
('customers', 'technicians') 1
('customer', 'experience') 1
('experience', 'wireless') 1
('visits', 'bad') 1