Question

我需要计算给定文件中的字符数。问题是，我没有正确拆分文件。如果我的输入文件有内容＆＃34; The！ dog-ate ##### the，cat＆＃34;，我不需要输出中的特殊字符。 o / p：t：4小时：2 e：3！：1 d：1 o：1 g：1 - ：1＃：5 ....此外，我需要删除＆＃34; - ＆＃34 ;签名并确保单词不会连接。

    from collections import Counter
    import sys
    filename = sys.argv[1]
    reg = '[^a-zA-Z+]'
    f = open(filename, 'r')
    x = f.read().strip()
    lines=[]
    for line in x:
       line = line.strip().upper()
       if line:
           lines.append(line)
    print(Counter(lines))

有人可以帮帮我吗？

Answer 1

使用re.sub并删除特殊字符。

import re

with open(filename) as f:
    content = re.sub('[^a-zA-Z]', '', f.read(), flags=re.M)    
counts = Counter(content)

演示：

In [1]: re.sub('[^a-zA-Z]', '', "The! dog-ate #####the,cat")
Out[1]: 'Thedogatethecat'

In [2]: Counter(_)
Out[2]: 
Counter({'T': 1,
         'a': 2,
         'c': 1,
         'd': 1,
         'e': 3,
         'g': 1,
         'h': 2,
         'o': 1,
         't': 3})

请注意，如果要同时计算大写和小写计数，可以将content转换为小写：

counts = Counter(content.lower())

Answer 2

只需删除您不想要的值：

c = Counter(lines)
del c['#']
del c['-']
del c[',']
print(c)

Answer 3

foo.txt的

asdas

!@#!@


asdljh


12j3l1k23j

自：

https://docs.python.org/3/library/string.html#string.ascii_letters

import string
from collections import Counter

with open('foo.txt') as f:
    text = f.read()

filtered_text = [char for char in text if char in in string.ascii_letters]
counted = Counter(filtered_text)
print(counted.most_common())

输出

[('a', 3), ('j', 3), ('s', 3), ('d', 2), ('l', 2), ('h', 1), ('k', 1)]

字符串中的字符数，不包括特殊字符

3 个答案: