Question

在python中，如何遍历文本文件并计算每个字母的出现次数？我意识到我可以使用'for x in file'语句来完成它，然后设置26左右，如果有elif语句，但肯定还有更好的方法吗？

感谢。

Answer 1

使用collections.Counter()：

from collections import Counter
with open(file) as f:
    c = Counter()
    for x in f:
        c += Counter(x.strip())

正如@mgilson指出的那样，如果文件不是那么大，你可以简单地做：

c = Counter(f.read().strip())

示例：

>>> c = Counter()
>>> c += Counter('aaabbbcccddd eee fff ggg')
>>> c
Counter({'a': 3, ' ': 3, 'c': 3, 'b': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3})
>>> c += Counter('aaabbbccc')
Counter({'a': 6, 'c': 6, 'b': 6, ' ': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3})

或使用count()字符串方法：

from string import ascii_lowercase     # ascii_lowercase =='abcdefghijklmnopqrstuvwxyz'
with open(file) as f:
    text = f.read().strip()
    dic = {}
    for x in ascii_lowercase:
        dic[x] = text.count(x)

Answer 2

使用字典 - 基本上是letters[char]++

Answer 3

基本上没有进口： is_letter是一个决定某事是否是一封信的函数，这样你就可以算上除了通常的英文字母之外的其他东西了

def add_or_init(dictionary, c):
        if(c in dictionary):
                dictionary[c]+=1
        else:
                dictionary[c]=1
def count_one_letter(dictionary, c, is_letter):
        if is_letter(c):
                add_or_init(dictionary, c)
def count_letters(dictionary, string, is_letter):
        for c in string:
                count_one_letter(dictionary, c, is_letter)
        return dictionary

#count all characters
count_letters(dict(),'aaabbbcccddd eee fff ggg',lambda x: True)
# => {'a': 3, ' ': 3, 'c': 3, 'b': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3}

Answer 4

计数器是一种很好的方法，但计数器仅适用于3.1及更高版本，加上2.7。

如果您使用的是3.0或2. [56]，您可能应该使用collections.defaultdict（int）。

Answer 5

这种方式为每个字符创建字典直方图，可用于创建条形图或类似字符。如果要将其限制为字母或某个子集，则需要添加额外的条件，或者在末尾过滤 freqs 。

freqs = {}
for line in file_list:
    for char in line:
        if char in freqs:
            freqs[char] += 1
        else:
            freqs[char] = 1

print freqs

我假设您已打开文件并使用内容填充* file_list *。

计算文本文件中字母的频率

5 个答案: