Question

在python中我如何开发这个算法来查找数组中文本中的常见模式，并说这些是这些项的出现次数。

  For ex:
line_arr=""" :)hello hi there,My name is 'pixel' can i speak to 'Tom'
Hi, tom here :) 'pixel'
how are u doing today.
i just called to ask whats the cost of the microwae oven is it $50 or $60
it is $75
any d $iscounts on this..
10% to 30%"""

reg_dict={}
for l in line_arr:
    #find all common patterns and update it in an dictionary

我们可以获得所有的表情符号，单引号中的名称，以$和百分比开头的当前情况......还有更常见的事情。并说我们在字典中更新..这有可能吗......

Answer 1

你拥有的是一个字符串，而不是一个数组。你应该先将它标记出来。完成后，您可以使用collections.Counter.most_common：

>>> from collections import Counter
>>> import re
>>> Counter(re.findall("\w+", line_arr)).most_common()[:5]
[('is', 3), ('to', 3), ('pixel', 2), ('it', 2), ('i', 2)]

如果你想找到表情符号，请使用与我上面使用的RE \w+不同的标记符。

python查找数组中的所有常见模式

1 个答案: