Question

我有一个包含几千个单词的大文本文件，我将在此文本文件中读取并显示给定文件中的行数，单词和字符。我有三个函数，我已经定义了这样做，但是当我运行单词count时，我只得到38作为文件中的单词数。我知道这是不正确的。

问题：如何处理文本文件以计算文件中的行数，字数和字符并打印出来？

def linecount(file):
    f = open(file, 'r')
    count = 0
    for line in f:
        count = line.split()
        count += 1
    return count


def wordcount(file):
    f = open(file, 'r')
    count = 0
    for line in f:
        allwords = line.split()
        count = count + len(allwords)
    return count


def charcount(file):
    f = open(file, 'r')
    count = 0
    for line in f:
        for char in line:
            count = count + 1
    return count

print('The file has:')
print('    ', linecount('test.txt'), ' characters')
print('    ', wordcount('test.txt'), ' lines')
print('    ', charcount('test.txt'), ' words')

Answer 1

有几种方法可以计算文本文件中的内容，但最快的是调用wc命令。在this gist上比较了几种方法。

如果你特别想要纯Python，那么缓冲读取效果最快：

def bufcount(filename):
    f = open(filename)                  
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.read # loop optimization

    buf = read_f(buf_size)
    while buf:
        lines += buf.count('\n')
        buf = read_f(buf_size)

    return lines

您发布的单词计数代码看起来会将每个空白行计为一个单词。

Answer 2

def countLinesWordsCharacters(text):
  return reduce(lambda a,b: tuple(x+y for x,y in zip(a,b)),
    ((1, len(line.split()), len(line)) for line in text))

with open(fileName) as text:
  print countLinesWordsCharacters(text)

这将打印出(5, 32, 432)，表示给定文件的行，单词和字符。

这只是为了给出一个关于如何以功能方式完成此事的印象。单词检测器需要进行改进（当前它只是len(line.split())，当然太粗糙了。）

但我认为，总体思路很明确。

如何计算文本文件中的行数，单词数，字符数？

2 个答案: