计算txtfile中所有字母的出现次数

时间:2017-01-08 04:20:37

标签: python-3.x dictionary text-files counting

我试图打开文件并计算字母的出现次数。

到目前为止,这就是我所处的位置:

def frequencies(filename):
    infile=open(filename, 'r')
    wordcount={}
    content = infile.read()
    infile.close()
    counter = {}
    invalid = "‘'`,.?!:;-_\n—' '"

    for word in content:
        word = content.lower()
        for letter in word:
            if letter not in invalid:
                if letter not in counter:
                    counter[letter] = content.count(letter)
                    print('{:8} appears {} times.'.format(letter, counter[letter]))

非常感谢任何帮助。

3 个答案:

答案 0 :(得分:0)

最好的方法是使用numpy包,示例就像这样

import numpy
text = "xvasdavawdazczxfawaczxcaweac"
text = list(text)
a,b = numpy.unique(text, return_counts=True)
x = sorted(zip(b,a), reverse=True)
print(x)

在您的情况下,您可以将所有单词组合成单个字符串,然后将字符串转换为字符列表 如果你想删除除字符之外的所有字符,可以使用正则表达式来清除它

#clean all except character
content = re.sub(r'[^a-zA-Z]', r'', content)
#convert to list of char
content = list(content)
a,b = numpy.unique(content, return_counts=True)
x = sorted(zip(b,a), reverse=True)
print(x)

答案 1 :(得分:0)

如果您正在寻找不使用numpy的解决方案:

invalid = set([ch for ch in  "‘'`,.?!:;-_\n—' '"])

def frequencies(filename):
    counter = {}
    with open(filename, 'r') as f:
        for ch in (char.lower() for char in f.read()):
            if ch not in invalid:
                if ch not in counter:
                    counter[ch] = 0
                counter[ch] += 1

        results = [(counter[ch], ch) for ch in counter]
        return sorted(results)

for result in reversed(frequencies(filename)):
    print result

答案 2 :(得分:0)

我建议改用collections.Counter

紧凑型解决方案

update.php

更易读的解决方案。

from collections import Counter
from string import ascii_lowercase # a-z string

VALID = set(ascii_lowercase)

with open('in.txt', 'r') as fin:
    counter = Counter(char.lower() for line in fin for char in line if char.lower() in VALID)
    print(counter.most_common()) # print values in order of most common to least.

如果您不想使用from collections import Counter from string import ascii_lowercase # a-z string VALID = set(ascii_lowercase) with open('in.txt', 'r') as fin: counter = Counter() for char in (char.lower() for line in fin for char in line): if char in VALID: counter[char] += 1 print(counter) ,那么您只需使用Counter

dict