Question

我正在使用library(data.table) setDT(test)[, if(any(price < 50)) .SD, prod_id]来查找给定文本中的并置单词，如下所示：

nltk.collocations

然后，我可以打印出给定单词的并置单词（及其似然比），如下所示：

import nltk.collocations
import collections

text = 'I like the customer service. The service personnel were good. So, I would recommend the customer service of XYZ company.'

word_list = []
for sent in nltk.sent_tokenize(text):
    for word in nltk.word_tokenize(sent):
        word_list.append(word)

bgm    = nltk.collocations.BigramAssocMeasures()
finder = nltk.collocations.BigramCollocationFinder.from_words(word_list)
scored = finder.score_ngrams( bgm.likelihood_ratio  )

# Group bigrams by first word in bigram.                                        
prefix_keys = collections.defaultdict(list)
for key, scores in scored:
    prefix_keys[key[0]].append((key[1], scores))

# Sort keyed bigrams by strongest association.                                  
for key in prefix_keys:
    prefix_keys[key].sort(key = lambda x: -x[1])

可以看出，print('Words collocated with "customer" are:', prefix_keys['customer']) >>> Words collocated with "customer" are: [('service', 9.949042176926831)] print('Words collocated with "service" are:', prefix_keys['service']) >>> Words collocated with "service" are: [('of', 4.4947649141916255), ('personnel', 4.4947649141916255), ('.', 1.0572102767208427)]被显示为service的并置词，但是customer没有被显示为customer的并置词。因此，似乎当NLTK说“并置”时，它们实际上的意思是“后面的单词”。

但是并置应该意味着正向和反向并置；也就是说，service紧随customer还是service紧随service都没关系，它们都应显示为并置。

那么，我如何找到实际的搭配，而不仅仅是“跟在后面的单词”搭配？

如何从文本中提取单词联想（正向和反向搭配）？

0 个答案: