Question

我试图在从文件中读取后平均每个单词的长度。但是，文件中的文本没有使用正常的句子结构格式化。有时在单词和句子中的新换行符之间会有一个额外的空格。

当前代码

def average(filename):
    with open(filename, "r") as f:
        for line in f:
            words = line.split()
            average = sum(len(words) for words in words)/len(words)
            return average

>>>4.3076923076923075

Expected
>>>4.352941176470588

文件

Here are some words   there is no punctuation but there are words what
is the average length

Answer 1

当您以f打开文件时，请运行

for x in f:

x将是文件中的每个行，以换行符结束。你得到的答案对于第一行文字是完全正确的。如果您希望第二行包含在第一行中，则需要整体处理文本文件，而不是逐行处理。

假设您想获得文件中所有单词的平均值，以下内容应该会更好一些：

def average(filename):
    with open(filename, "r") as f:
        lines = [line for line in f]
        words = " ".join(lines).split()
        average = sum(len(word) for word in words)/len(words)
    return average

在Python中从文件读取后正确拆分字符串

1 个答案: