Question

以下是问题：

我有一个包含以下文字的文件：

hey how are you
I am fine and you
Yes I am fine

要求它找到单词，行和字符的数量。

以下是我的程序，但没有空格的字符数不正确。

单词数正确且行数正确。同一个循环中的错误是什么？

fname = input("Enter the name of the file:")
infile = open(fname, 'r')
lines = 0
words = 0
characters = 0
for line in infile:
    wordslist = line.split()
    lines = lines + 1
    words = words + len(wordslist)
    characters = characters + len(line)
print(lines)
print(words)
print(characters)

输出结果为：

lines=3(Correct)
words=13(correct)
characters=47

我在网站上看到了多个答案，我很困惑，因为我没有学习Python中的其他功能。我如何纠正代码的简单和基本，就像我在循环中所做的一样？

而没有空格的字符数是35，空格是45。如果可能的话，我想找到没有空格的字符数。即使有人知道带有空格的字符数的循环也不错。

Answer 1

总结一行中所有单词的长度：

characters += sum(len(word) for word in wordslist)

整个计划：

with open('my_words.txt') as infile:
    lines=0
    words=0
    characters=0
    for line in infile:
        wordslist=line.split()
        lines=lines+1
        words=words+len(wordslist)
        characters += sum(len(word) for word in wordslist)
print(lines)
print(words)
print(characters)

输出：

3
13
35

此：

(len(word) for word in wordslist)

是generator expression。它本质上是一行中的循环，它产生每个单词的长度。我们将这些长度直接提供给sum：

sum(len(word) for word in wordslist)

改进版本

此版本利用enumerate，因此您可以保存两行代码，同时保持可读性：

with open('my_words.txt') as infile:
    words = 0
    characters = 0
    for lineno, line in enumerate(infile, 1):
        wordslist = line.split()
        words += len(wordslist)
        characters += sum(len(word) for word in wordslist)

print(lineno)
print(words)
print(characters)

这一行：

with open('my_words.txt') as infile:

打开文件，承诺在留下缩进后立即将其关闭。完成使用后，关闭文件始终是一个好习惯。

Answer 2

请记住，每一行（除了最后一行）都有一个行分隔符。即“\ r \ n”用于Windows或“\ n”用于Linux和Mac。

因此，在这种情况下，恰好添加了两个字符，如47而不是45。

克服这种情况的一个好方法是使用：

import os

fname=input("enter the name of the file:")
infile=open(fname, 'r')
lines=0
words=0
characters=0
for line in infile:
    line = line.strip(os.linesep)
    wordslist=line.split()
    lines=lines+1
    words=words+len(wordslist)
    characters=characters+ len(line)
print(lines)
print(words)
print(characters)

Answer 3

要对字符进行计数，您应该计算每个单词。所以你可以有另一个计算字符的循环：

for word in wordslist:
    characters += len(word)

应该这样做。单词列表应该可以删除右侧的换行符，可能是wordslist = line.rstrip().split()。

Answer 4

评论太长了。

Python 2或3？因为它确实很重要。在REPL中尝试以下两种方法：

Python 2.7.12
>>>len("taña")
5

Python 3.5.2
>>>len("taña")
4

咦？答案在于unicode。那个ñ是一个'n'，结合了变音符号。表示其1个字符，但不是1个字节。因此，除非您使用纯ASCII文本，否则最好指定您的字符计数功能的。

Answer 5

这是怎么回事？它使用正则表达式匹配所有非空白字符，并返回字符串中的匹配数。

import re

DATA="""
hey how are you
I am fine and you
Yes I am fine
"""

def get_char_count(s):
    return len(re.findall(r'\S', s))

if __name__ == '__main__':
    print(get_char_count(DATA))

<强>输出

下图显示了在RegExr上测试的内容：

Answer 6

我发现这个解决方案非常简单易读：

with open("filename", 'r') as file:
    text = file.read().strip().split()
    len_chars = sum(len(word) for word in text)
    print(len_chars)

Answer 7

它可能正在计算新行字符。用（行+ 1）

减去字符

Answer 8

以下是代码：

fp = open(fname, 'r+').read()
chars = fp.decode('utf8')
print len(chars)

检查输出。我刚试过它。

Answer 9

比其他人更多的Pythonic解决方案：

with open('foo.txt') as f:
  text = f.read().splitlines() # list of lines

lines = len(text) # length of the list = number of lines
words = sum(len(line.split()) for line in text) # split each line on spaces, sum up the lengths of the lists of words
characters = sum(len(line) for line in text) # sum up the length of each line

print(lines)
print(words)
print(characters)

此处的其他答案是手动执行str.splitlines()所做的事情。没有理由重新发明轮子。

Answer 10

您的答案是正确的 - 您的代码完全正确无误。我认为它正在做的事情是有一个行尾字符被传递，其中包括你的字符数为2（最后一行没有一个，因为没有新的行去）。如果你想删除它，那么简单的软糖就像Loaf建议的那样

characters = characters - (lines - 1)

请参阅第二部分的csl答案......

Answer 11

你有两个问题。一个是行结尾，另一个是两者之间的空格。

现在有很多人发布了很好的答案，但我发现这种方法更容易理解：

characters = characters + len(line.strip()) - line.strip().count(' ')

line.strip（）删除尾随和前导空格。然后我从总长度中减去空格数。

Answer 12

在调用len，

时，只需跳过不需要的字符即可

import os
characters=characters+ len([c for c in line if c not in (os.linesep, ' ')])

或sum计数，

characters=characters+ sum(1 for c in line if c not in (os.linesep, ' '))

或从str构建wordlist并获取len，

characters=characters+ len(''.join(wordlist))

或sum wordlist中的字符。我认为这是最快的。

characters=characters+ sum(1 for word in wordlist for char in word)

Answer 13

非常简单：

f = open('file.txt', 'rb')
f.seek(0) # Move to the start of file
print len(f.read())

Answer 14

这是我得到的最小的程序，较少的内存用于解决您的问题

with open('FileName.txt') as f:
  lines = f.readlines()
  data = ''.join(lines)
  print('lines =',len(lines))
  print('Words = ',len(data.split()))
  data = ''.join(data.split())
  print('characters = ',len(data))

lines是行列表，所以行长不过是行数。下一步数据包含一串文件内容（每个单词用空格隔开），所以如果我们分割数据，将得到单词列表您的文件。因此，该列表的长度给出了单词数。再次，如果我们加入单词列表，您将获得所有字符作为一个字符串。因此，它的长度给出了字符数。

Answer 15

从输入参数中将输入作为文件名，即 files.txt ，然后计算文件中的字符总数并将其保存到变量中字符

fname = input("Enter the name of the file:")
infile = open(fname, 'r')                   # connection of the file
lines = 0
words = 0
char = 0                                    # init as zero integer
for line in infile:
    wordslist = line.split()                # splitting line to word
    lines = lines + 1                       # counter up the word
    words = words + len(wordslist)          # splitting word to charac
    char = char + len(line)                 # counter up the character

print("lines are: " + str(lines))
print("words are: " + str(words))
print("chars are: " + str(char))            # printing beautify

Answer 16

num_lines = sum(1 for line in open('filename.txt'))
num_words = sum(1 for word in open('filename.txt').read().split())
num_chars = sum(len(word) for word in open('filename.txt').read().split())

使用Python查找文件中的字符数

16 个答案:

改进版本