Python - 计算重复的字符串

时间:2015-09-12 01:55:04

标签: python duplicates counting

我尝试编写一个函数来计算字符串中单词重复的次数,然后如果重复次数超过某个数字(n)则返回该单词。这就是我到目前为止所拥有的:

from collections import defaultdict

def repeat_word_count(text, n):
  words = text.split()
  tally = defaultdict(int)
  answer = []

  for i in words:
    if i in tally:
      tally[i] += 1
    else:
      tally[i] = 1

在将字典值与n进行比较时,我不知道从哪里开始。

它应该如何运作: repeat_word_count("一个是赛马,两个是两个也是#34;,3)应该返回[' one']

5 个答案:

答案 0 :(得分:2)

尝试

for i in words:
    tally[i] = tally.get(i, 0) + 1

而不是

for i in words:
    if i in tally:
         tally[words] += 1 #you are using words the list as key, you should use i the item
    else:
         tally[words] = 1

如果您只想计算单词,请使用collections.Counter

>>> import collections
>>> a = collections.Counter("one one was a racehorse two two was one too".split())
>>> a
Counter({'one': 3, 'two': 2, 'was': 2, 'a': 1, 'racehorse': 1, 'too': 1})
>>> a['one']
3

答案 1 :(得分:0)

如果你想要的是date计算字符串中的单词,你可以试试这个:

dictionary

输出:

string = 'hello world hello again now hi there hi world'.split()
d = {}
for word in string:
    d[word] = d.get(word, 0) +1
print d

答案 2 :(得分:0)

这是一种方法:

from collections import defaultdict
tally = defaultdict(int)
text = "one two two three three three"
for i in text.split():
    tally[i] += 1
print tally  # defaultdict(<type 'int'>, {'three': 3, 'two': 2, 'one': 1})

将此功能放在一个功能中:

def repeat_word_count(text, n):
    output = []
    tally = defaultdict(int) 
    for i in text.split():
        tally[i] += 1
    for k in tally:
      if tally[k] > n:
          output.append(k)
    return output

text = "one two two three three three four four four four"
repeat_word_count(text, 2)
Out[141]: ['four', 'three']  

答案 3 :(得分:0)

正如luoluo所说,使用collections.Counter。

要获取具有最高计数的项目,请使用带有参数1的{​​{3}}方法,该方法返回其第二个坐标最大相同的对(word, tally)的列表相符。如果“句子”是非空的,则该列表也是如此。因此,以下函数返回某个字,如果有的话,该字至少出现n次,否则返回None

from collections import Counter

def repeat_word_count(text, n):
    if not text: return None        # guard against '' and None!
    counter = Counter(text.split())
    max_pair = counter.most_common(1)[0]
    return max_pair[0] if max_pair[1] > n else None

答案 4 :(得分:0)

为什么不在这种情况下使用 Counter class

from collections import Counter
cnt = Counter(text.split())

元素存储为字典键,其计数存储为字典值。然后,使用iterkeys()保持超过n数的单词就像

中的for循环一样
list=[]
for k in cnt.iterkeys():
    if cnt[k]>n:
        list.append(k)

在列表中,您将获得单词列表。

**编辑:对不起,如果你需要很多单词,BrianO会为你的情况选择合适的单词。