Question

python中是否有内置函数返回“三个频繁出现的单词组（连续）”。我知道如何以编程方式这样做，但我正在寻找内置函数。此外，我将这些单词存储在MySQL表中的1个字段的行中，因此我在python或MySQL中搜索解决方案。

例如，如果我的数据库包含用户注释作为字段，那么我想在这些注释中检索3个最常出现的连续词。这样的3个连续单词的一个例子是“我认为”。我也知道怎么用1个单词，使用SQL ...但是我搜索了以前的帖子，找不到连续3个单词？

Answer 1

没有你需要的内置内容，但这个列表理解应该有效并且非常简洁：

l = 'there are no builtins for that'.split()
print [" ".join(l[n:n+3]) for n in xrange(len(l)-2)]
['there are no', 'are no builtins', 'no builtins for', 'builtins for that']

然后，调用最后一个结果r：

import collections
c = collections.Counter()
for item in r:
    c[item] += 1
print c
Counter({'there are no': 1, 'are no builtins': 1, 'no builtins for': 1, 'builtins for that': 1})

Answer 2

另一种选择：

>>> from collections import Counter
>>> l = 'zip can be used for that. Counter can be used as well'.lower().split()
>>> Counter(zip(l, l[1:], l[2:]))
Counter({('can', 'be', 'used'): 2, ('used', 'as', 'well'): 1, ('for', 'that.', 'counter'): 1, ('counter', 'can', 'be'): 1, ('be', 'used', 'for'): 1, ('zip', 'can', 'be'): 1, ('used', 'for', 'that.'): 1, ('be', 'used', 'as'): 1, ('that.', 'counter', 'can'): 1})

现在您可以应用统计信息：

>>> counter.most_common(1)[0][0]
('can', 'be', 'used')

或者如果你想要一个联合字符串：

>>> ' '.join(counter.most_common(1)[0][0])
'can be used'

Python或MySQL中是否有内置函数返回一组3个频繁出现的单词？

2 个答案: