找到python文件中的所有空格,换行符和制表符

时间:2015-09-26 00:11:50

标签: python python-3.x

def count_spaces(filename): 
    input_file = open(filename,'r') 
    file_contents = input_file.read() 
    space = 0 
    tabs = 0 
    newline = 0 
    for line in file_contents == " ": 
        space +=1 
        return space
    for line in file_contents == '\t': 
        tabs += 1 
        return tabs 
    for line in file_contents == '\n': 
        newline += 1
        return newline 
    input_file.close()

我正在尝试编写一个函数,该函数将文件名作为参数,并返回文件中所有空格,换行符和制表符的总数。我想尝试使用一个基本的for循环和if语句,但我现在正在努力:/任何帮助都会非常感谢。

4 个答案:

答案 0 :(得分:0)

C=Counter(open(afile).read())
C[' ']

答案 1 :(得分:0)

您当前的代码无效,因为您在单个混乱语句中将循环语法(for x in y)与条件测试(x == y)结合在一起。你需要将它们分开。

您还需要使用一个return语句,否则您到达的第一个语句将停止运行该函数,而其他值将永远不会返回。

尝试:

for character in file_contents:
    if character == " ":
        space +=1
    elif character == '\t': 
        tabs += 1
    elif character == '\n': 
        newline += 1
return space, tabs, newline

Joran Beasley的答案中的代码是一种更加Pythonic的方法来解决这个问题。您可以使用the collections.Counter class计算文件中所有字符的出现次数,而不是为每种字符设置单独的条件,只需在末尾提取空白字符的计数。 Counter就像字典一样。

from collections import Counter

def count_spaces(filename):
    with open(filename) as in_f:
        text = in_f.read()
    count = Counter(text)
    return count[" "], count["\t"], count["\n"]

答案 2 :(得分:0)

要支持大文件,您可以一次读取固定数量的字节:

#!/usr/bin/env python
from collections import namedtuple

Count = namedtuple('Count', 'nspaces ntabs nnewlines')

def count_spaces(filename, chunk_size=1 << 13):
    """Count number of spaces, tabs, and newlines in the file."""
    nspaces = ntabs = nnewlines = 0
    # assume ascii-based encoding and b'\n' newline
    with open(filename, 'rb') as file:
        chunk = file.read(chunk_size)
        while chunk:
            nspaces   += chunk.count(b' ')
            ntabs     += chunk.count(b'\t')
            nnewlines += chunk.count(b'\n')
            chunk = file.read(chunk_size)
    return Count(nspaces, ntabs, nnewlines)

if __name__ == "__main__":
    print(count_spaces(__file__))

输出

Count(nspaces=150, ntabs=0, nnewlines=20)

mmap允许您将文件视为字节串而不将整个文件实际加载到内存中,例如,您可以在其中搜索正则表达式模式:

#!/usr/bin/env python3
import mmap
import re
from collections import Counter, namedtuple

Count = namedtuple('Count', 'nspaces ntabs nnewlines')

def count_spaces(filename, chunk_size=1 << 13):
    """Count number of spaces, tabs, and newlines in the file."""
    nspaces = ntabs = nnewlines = 0
    # assume ascii-based encoding and b'\n' newline
    with open(filename, 'rb', 0) as file, \
         mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
        c = Counter(m.group() for m in re.finditer(br'[ \t\n]', s))
        return Count(c[b' '], c[b'\t'], c[b'\n'])

if __name__ == "__main__":
    print(count_spaces(__file__))

输出

Count(nspaces=107, ntabs=0, nnewlines=18)

答案 3 :(得分:0)

在我的情况下,制表符(\ t)转换为“”(四个空格)。所以我修改了 逻辑有点照顾。

def count_spaces(filename):
    with open(filename,"r") as f1:
        contents=f1.readlines()

    total_tab=0
    total_space=0
    for line in contents:
        total_tab += line.count("    ")
        total_tab += line.count("\t")
        total_space += line.count(" ")
    print("Space count = ",total_space)
    print("Tab count = ",total_tab)
    print("New line count = ",len(contents))
    return total_space,total_tab,len(contents)