Question

我正在打开一个文本文件，显示该文件中的字符总数，然后对每个字符（字母，数字，标点符号等）进行分类。我的范围是32-127的ASCII表，由于某种原因，字符数似乎超过了我在网上进行字符统计时看到的字数。

def totalLength():
    inFile = open("draft_UTF-8.txt", 'r', encoding = 'ISO-8859-1')
    readFile = inFile.read()
    print("Total amount of characters with spaces included:", len(readFile))
    inFile.close()

除此之外，每当我对文件进行分类时，我的程序都会显示ASCII范围之外的字符，即使我没有在ASCII范围之外放置任何字符。这是我的组织代码。

inFile = open("draft_UTF-8.txt", 'r', encoding = 'ISO-8859-1')
readFile = inFile.read()
alpha = 0
num = 0
space = 0
special = 0
other = 0
for lines in readFile:
    for ch in lines:
        if ch in string.ascii_letters:
            alpha += 1
        elif ch in string.digits:
            num += 1
        elif ch == ' ':
            space += 1
        elif ch in string.punctuation:
            special += 1
        else:
            other += 1

然后我会打印每个类别。在我的文本文件中，我有以下内容：

1234567890
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
~`!@#$%^&*()_-++|\}]{[“’:;?/>.<,

输出结果为：

Total amount of characters with spaces included: 101

There are 52 occurrences of alphabetical characters.

There are 10 occurrences of numerical characters.

There are 0 occurrences of white spaces.

There are 30 occurrences of punctuation characters.

there are 9 occurrences of other characters.

我发现其他角色的出现来自标点符号，但不确定是哪一个。有什么建议吗？

编辑：我发现我在输出中获得额外字符的原因是因为编码：ISO-8859-1。我的主要问题是Python不会运行我的程序，除非我有这个编码，主要是因为我使用的是MAC OS ..它在PyCharm上没有它，但在Python上，我的程序会崩溃。

Answer 1

哦，明白了。谢谢dhke。在添加encoding = ISO-8859-1之前，我有各种文本文件，其中一个是UTF-8和.txt文件。最初，我的文件不能用于普通的.txt，而是使用UTF-8。然后，在Python上，UTF-8不起作用，但在PyCharm上工作。所以现在我的编码PLUS是一个UTF-8，这就是为什么。谢谢！我为这个愚蠢的问题道歉，我是编码的新手，所以我现在已经想到了这个问题。

Python字符数是不准确的？

1 个答案: