python word counter只显示重复

时间:2015-10-03 13:03:54

标签: python counter word

可以有人帮助我试图在python中做反击但我不想要结果如果它们不重复

 from collections import Counter
import collections

with open('test.txt') as myFile:
    array =[]
    for word in myFile:
        #convert all to lowercase
        word_lower = word.lower()

        #escape punctuation
        import string
        for row in string.punctuation:
            word_lower = word_lower.replace(row,"")
        array.append(word_lower)

        a = collections.Counter(array)

    print a

myfile看起来像 测试

约翰

麦克

测试

但是我的输出现在显示了所有内容,我只想在多次显示名称时才显示升级

3 个答案:

答案 0 :(得分:2)

您需要在使用Counter dict来计算字数后进行过滤,您可以将str.split映射到文件对象上,chain将元素一起拖过标点符号并仅保留键来自Counter dict的值为> 1:

from collections import Counter
from itertools import chain
from string import punctuation

with open('test.txt') as f:
    cn = Counter(w.lower().rstrip(punctuation) 
                 for w in chain.from_iterable(map(str.split,f)))
    # if v > 1 the word appeared at least twice
    print [w for w,v in cn.items() if v > 1]

每行有多少单词并不重要,拆分会删除换行符等等。链接会链接所有单词,允许你调用rstrip来删除每个单词的标点符号。

如果您想从任何地方删除标点符号,只需str.translate它:

from collections import Counter
from itertools import chain
from string import punctuation

with open('test.txt') as f:
    cn = Counter(w.lower().translate(None, punctuation) 
                 for w in chain.from_iterable(map(str.split,f)))
    print [w for w,v in cn.items() if v > 1]

w.translate(None, punctuation)foo's进入foos,其中rsrip将保留foo's,您必须决定哪个更合适。

如果你想要一个dict作为输出,只需更改列表comp:

out = {w:v for w,v in cn.items() if v > 1}

使用str.translatestr.rstrip(punctuation)for row in string.punctuation...效率更高,并且每次调用替换

答案 1 :(得分:0)

    import collections

    with open('data.csv') as myFile:
        lines = myFile.read().lower().splitlines()
        dict = collections.Counter(lines)

    print(dict)

>>> Counter({'kim': 2, 'mike': 1, 'john': 1, 'test': 1})

您的代码缺少有关阅读文件和Counter的一些主要概念。

Counter获取一个列表并计算出现次数。因此,当您从文件中读取时,您需要将其添加到上面脚本第5行所示的列表中。

现在您有一个collections.Counter类,其名称为keys,计数为values。你可以拿出那些并输出值大于2的那些。

对于后一部分,您可以这样做:

filtered_dict = {k:v for (k,v) in array.items() if array[k] >= 2}
>>> {'kim': 2}

答案 2 :(得分:0)

我设法自己做 这是

的方式
import collections
import string
with open('test.txt') as myFile:
    array =[]
    for word in myFile:
        # convert all to lowercase
        word_lower = word.lower()
        # skip empty rows in file
        if word_lower != "\n":
            # escape punctuation
            for char in string.punctuation:
                word_lower = word_lower.replace(char,"")
                # styling   for output in same row
                word_lower = word_lower.replace("\n","")
            # save result in array
            array.append(word_lower)

        # counter array
        a = collections.Counter(array)
for item in a:
    if a[item] >1:
        print item, a[item]