Question

我已经看到了类似问题的答案： https://stackoverflow.com/a/44311921/5881884

使用ahocorasick算法来显示列表中的每个单词是否以O（n）存在于字符串中。但是我想获取字符串列表中每个单词的出现频率。

例如，如果

my_string = "some text yes text text some"
my_list = ["some", "text", "yes", "not"]

我想要结果：

[2, 3, 1, 0]

我没有在documentation中找到确切的示例，知道如何实现吗？

除了使用ahocorasick以外的其他O（n）解决方案也将受到赞赏。

Answer 1

实施：

这是一个Aho-Corasick频率计数器：

custom_id

（例如，您要致电function doConfirm() { var c = confirm('Are you sure you wish to delete this entry?'); if (c) { $.ajax( '/api/show/competition/delete', { 'method': 'POST', 'data': { 'id' : 9 }, 'dataType': 'json', 'complete': function(response, status) { if (response.responseJSON.error) { alert(response.responseJSON.message); window.location.reload(); } else { document.location.href = "/show/application/competition"; } } } ); } else { document.location.href = "/show/application/competition/entry/9"; } } $(document).ready(function() { setTimeout(function(){ doConfirm() }, 100); });来获取计数列表）

对于大中型输入，这将比其他方法快得多。

注释：

对于真实数据，此方法可能会产生与其他解决方案不同的结果，因为Aho-Corasick会查找目标词的所有出现，包括子字符串。

如果您只想查找全字词，则可以调用import ahocorasick def ac_frequency(needles, haystack): frequencies = [0] * len(needles) # Make a searcher searcher = ahocorasick.Automaton() for i, needle in enumerate(needles): searcher.add_word(needle, i) searcher.make_automaton() # Add up all frequencies for _, i in searcher.iter(haystack): frequencies[i] += 1 return frequencies并使用原始字符串的空格/标点符号填充版本：

ac_frequency(my_list, my_string)

Answer 2

您可以使用列表推导来计算特定列表在my_string中出现的次数：

[my_string.split().count(i) for i in my_list]
[2, 3, 1, 0]

Answer 3

您可以使用字典来计算您关心的单词的出现次数：

counts = dict.fromkeys(my_list, 0) # initialize the counting dict with all counts at zero

for word in my_string.split():
    if word in counts:     # this test filters out any unwanted words
        counts[word] += 1  # increment the count

counts字典将保存每个单词的计数。如果您确实确实需要一个与原始关键字列表相同顺序的计数列表（而字典则不会这样做），则可以在循环结束后添加最后一步：

results = [counts[word] for word in my_list]

Answer 4

Counter模块中的collections可能对您有用：

from collections import Counter

my_string = "some text yes text text some"
my_list = ["some", "text", "yes", "not"]

counter = Counter(my_string.split(' '))
[counter.get(item, 0) for item in my_list]

# out: [2, 3, 1, 0]

具有O（n）的字符串中单词列表出现的次数

4 个答案: