使用python的字符串中的单词的加权计数

时间:2013-07-15 13:21:23

标签: python list dictionary

我有一串话:

foo = "This is a string"

我还有一个按以下方式格式化的列表:

bar = ["this","3"], ["is","5"]

我需要创建一个脚本,在foo中搜索单词,如果找到一个单词,计数器应该在单词旁边添加数字。 我走到这一步:

bar_count=0
for a,b in foo:
   if bar in a:
       bar_count+=b

但这似乎不起作用,任何人都有任何想法?

6 个答案:

答案 0 :(得分:2)

使用dict保持计数;

foo = "This is a string"
words = foo.split()
count = {}
scores = {"this": 3,
          "is": 5
}

for word in words:
    if word not in count:
        count[word] = 0

    if word in scores:
        count[word] += scores[word]
    else:
        count[word] += 1

答案 1 :(得分:1)

使用collections.defaultdict

>>> foo = "This is a string string This bar"
>>> dic = collections.defaultdict(int)
>>> for f in foo.split():
...     dic[f] += 1
>>> dic
defaultdict(<type 'int'>, {'This': 2, 'a': 1, 'is': 1, 'bar': 1, 'string': 2})

修改

从您当前的列表中创建一个字典,dict是数据的更好表示

>>> foo = 'this is a string this bar'
>>> bar = [['this', 3], ['is', 5]]
>>> dic = dict(bar)
>>> dict(bar)
{'this': 3, 'is': 5}

现在,查找字符串中的单词并添加内容

>>> for f in foo.split():
...     try:
...         dic[f] += 1
...     except:
...         pass
>>> dic
{'this': 5, 'is': 6}

这有帮助吗?

答案 2 :(得分:1)

此代码将创建一个字典,其中包含找到的单词作为键,值将是单词出现的时间:

foo = "This is a string is is"
bar = {}

words = foo.split(" ")

for w in words:
    if(w in bar):
        # its there, just increment its value
        bar[w] += 1
    else:
        # its not yet there, make new key with value 1
        bar[w] = 1

for i in bar:
    print i,"->", bar[i]

这段代码yelds:

>>> 
This -> 1
a -> 1
is -> 3
string -> 1

答案 3 :(得分:1)

如果您只是想要一个总计 - 将bar转换为dict并使用它来查找有效字词,默认为0未知,以便通过sum运行它:

foo = "This is a string"
bar = ["this","3"], ["is","5"]
scores = {w: int(n) for w, n in bar}
bar_count = sum(scores.get(word, 0) for word in foo.lower().split())
# 8

如果您想要单词的计数,但是从bar中指定的总数开始每个单词:

from collections import Counter
start = Counter({w: int(n) for w, n in bar})
total = start + Counter(foo.lower().split())
# Counter({'is': 6, 'this': 4, 'a': 1, 'string': 1})

答案 4 :(得分:1)

这适用于您的情况

foo = "This is a string"
bar = ["this","3"], ["is","5"]

bar_count = 0
for word, value in bar:
   if foo.count(word) > 0:
       bar_count += int(value)

答案 5 :(得分:1)

这不使用显式循环(除了理解之外),并且我认为很容易理解:

import collections
weight_list = ["this","3"], ["is","5"]
foo = "This is a string"

def weighted_counter(weight_list, countstring):
    #create dict {word:count of word}. uses lower() because that's
    # the format of the weight_list
    counts = collections.Counter(countstring.lower().split())

    #multiply weight_list entries by the number of appearances in the string
    return {word:int(weight)*counts.get(word,0) for word,weight in weight_list}

print weighted_counter(weight_list, foo)
#{'this': 3, 'is': 5}
#take the sum of the values (not keys) in the dict returned
print sum(weighted_counter(weight_list, "that is the this is it").itervalues())
#13

行动中:http://ideone.com/ksdI1b