Question

我想计算文本文件中非字母数字字符的数量，以便我可以将该计数用作我的功能集中的一个功能进行文本分类。任何帮助都会有很大帮助。

Answer 1

您可以简单地遍历文件并使用str.isalnum()方法计算非字母数字字符的数量。

类似于：

hidden-md hidden-lg hidden-xl

因此，在循环之后，count_special = 0 with open(filename,mode='r') as f: for line in f: count_special += sum(not x.isalnum() for x in line) print(count_special)包含非字母数字字符的总数。

通过一次解析一行，此方法通常能够处理大型文本文件，因为它不必先将整个文件加载到内存中。

Answer 2

你可以用这种方式。您必须将特殊字符列表定义为regex：

import re
x = re.compile("[^A-Za-z0-9]") 
# also you can use "[^\w]" or "[\W]" patterns

并查找特殊字符数作为re.findall的长度：

s = ", : ; \" ' ? / > < { [ ) * & \n"
print len(re.findall(x, s))
29

Answer 3

您可以使用字符串方法“isalnum”（https://docs.python.org/3/library/stdtypes.html#str.isalnum），如果字符串包含非字母数字字符，则返回False。

>>> "Egg".isalnum()
>>> True
>>> "[".isalnum()
>>> False
>>> "-Spam-".isalnum()
>>> False

Answer 4

此函数将返回文件中非字母数字chacacters的数量：

def count_non_alphanumeric(filename):
    with open(filename, "r") as f: #open the file as f
        nonalpha_count=0           # this is the running count of alphanumeric chars
        for line in f: #for each line in the file...
            for ch in line:  #for each character in the line...
                if not ch.isalnum():  # check to see if the character is alphanumeric
                    nonalpha_count+=1
    return nonalpha_count

如何计算python中文本文件中的特殊字符？

4 个答案: