Question

所以我有一个单词数组，存储为键值对。现在我正在尝试计算字符串数组{/ 1}}中单词的频率。我尝试了以下但是这并没有找到x的索引，因为它只是一个字符串。我在令牌数组中没有@ApplicationPath("/api") public class MyApplication extends Application { @Override public Map<String, Object> getProperties() { Map<String, Object> props = new HashMap<>(); props.put("jersey.config.server.disableMoxyJson", true); return props; } }的相应值（如果有）。有没有办法直接访问它而不是再添加一个循环来找到它？

tokens

Answer 1

要计算字符串数组中单词的频率，您可以使用Counter中的collections：

In [89]: from collections import Counter

In [90]: s=r'So I have an array of words, stored as key value pairs. Now I am trying to count the frequency of words in an array of strings, tokens. I have tried the following but this doesnt find the index of x as it is only a string. I do not have the corresponding value, if any, of x in tokens array. Is there any way to directly access it rather than adding one more loop to find it first?'

In [91]: tokens=s.split()

In [92]: c=Counter(tokens)

In [93]: print c
Counter({'of': 5, 'I': 4, 'the': 4, 'it': 3, 'have': 3, 'to': 3, 'an': 2, 'as': 2, 'in': 2, 'array': 2, 'find': 2, 'x': 2, 'value,': 1, 'words': 1, 'do': 1, 'there': 1, 'is': 1, 'am': 1, 'frequency': 1, 'if': 1, 'string.': 1, 'index': 1, 'one': 1, 'directly': 1, 'tokens.': 1, 'any': 1, 'access': 1, 'only': 1, 'array.': 1, 'way': 1, 'doesnt': 1, 'Now': 1, 'words,': 1, 'more': 1, 'a': 1, 'corresponding': 1, 'tried': 1, 'than': 1, 'adding': 1, 'strings,': 1, 'but': 1, 'tokens': 1, 'So': 1, 'key': 1, 'first?': 1, 'not': 1, 'trying': 1, 'pairs.': 1, 'count': 1, 'this': 1, 'Is': 1, 'value': 1, 'rather': 1, 'any,': 1, 'stored': 1, 'following': 1, 'loop': 1})

In [94]: c['of']
Out[94]: 5

编辑：

当拥有外部循环时手动计算单词。令牌随着每次迭代而变化，@ Alexander认为这是一个好方法。此外，Counter支持+运算符，这使累积计数更容易：

In [30]: (c+c)['of']
Out[30]: 10

Answer 2

您肯定希望使用@zhangzaochen建议的Counter。

但是，这是编写代码的更有效方法：

words = {}
for x in tokens:
    if x in words:
        words[x] += 1
    else:
        words[x] = 1

您还可以使用列表理解：

tokens = "I wish I went".split()
words = {}
_ = [words.update({word: 1 if word not in words else words[word] + 1}) 
     for word in tokens]
>>> words
{'I': 2, 'went': 1, 'wish': 1}

更新存储在数组中的dict

2 个答案:

编辑：