Python-遍历关键字列表,搜索字符串中的匹配项,计算最终总数

时间:2019-06-25 16:39:36

标签: python list loops text

我想检查一下一些单词,看看它们是否出现在研究摘要中,如果是,请计算出现的次数。不知道我的代码在做什么错,但是计数不正确。预先感谢!

if (agent.customer.purchaseConvStore == true) {
    agent.customer.moveTo(convStoreArea);
    agent.resourceUnitsOfSeize();
    agent.(***seized_charger_here***).statechart.fireEvent("start charging");
}

2 个答案:

答案 0 :(得分:1)

通常,对于分组,dict是一个不错的选择。为了进行计数,可以使用类似以下的实现:

c = {}

singleabstract = 'This is a research abstract that includes words like 
  mental health and anxiety.  My hope is that I get my code to work and 
  not resort to alcohol.'

for s in singleabstract.split():
    s = ''.join(char.lower() for char in s if char.isalpha()) # '<punctuation>'.isalpha() yields False
    # you'll need to check if the word is in the dict
    # first, and set it to 1
    if s not in c:
        c[s] = 1
    # otherwise, increment the existing value by 1
    else:
        c[s] += 1

# You can sum the number of occurrences, but you'll need
# to use c.get to avoid KeyErrors
occurrences = sum(c.get(term, 0) for term in mh_terms)

occurrences
3

# or you can use an if in the generator expression
occurrences = sum(c[term] for term in mh_terms if term in c)

计算出现次数的最佳方法是使用collections.Counter。这是一本字典,使您可以O(1)检查密钥:

from collections import Counter

singleabstract = 'This is a research abstract that includes words like 
  mental health and anxiety.  My hope is that I get my code to work and 
  not resort to alcohol.'

# the Counter can consume a generator expression analogous to
# the for loop in the dict implementation
c = Counter(''.join(char.lower() for char in s if char.isalpha()) 
            for s in singleabstract.split())

# Then you can iterate through
for term in mh_terms:
    # don't need to use get, as Counter will return 0
    # for missing keys, rather than raising KeyError 
    print(term, c[term]) 

mental 1
ptsd 0
sud 0
substance abuse 0
drug abuse 0
alcohol 1
alcoholism 0
anxiety 1
depressing 0
bipolar 0
mh 0
smi 0
oud 0
opioid 0

要获得所需的输出,可以对Counter对象的值求和:

total_occurrences = sum(c[v] for v in mh_terms)

total_occurrences
3

答案 1 :(得分:1)

首先,print(number_of_occurences)的范围应为每个mh的范围,以显示该特定单词的出现次数。其次,打印我们打印消息中的单词部分。我认为程序的主要问题是您应该使用mh.lower()而不是mh.lower