Question

我必须评估一个包含格式差的文章的文本文件，我必须重新格式化。第一步是删除句子中的所有额外空格。我决定读入文件，然后将所有行放入一个字符串，然后我将包含句子的行放入它自己的单独列表中。现在我在决定如何删除列表中的额外空格时遇到了麻烦，并且想知道是否可以使用内置方法来删除多余的空格？

以下是我列表中一个句子的示例：

["Albuquerque is my     turkey and he's   feathered and    he's      fine, And    he"]

和我到目前为止的代码：

def remove_extra_whitespaces():
    fileList= []
    removeList= []
    infile= open("essay1.txt", 'r')
    for line in infile:
        if (len(line))>0:
            fileList.append(line.strip())

        else:
            fileList.append(line)
    print (len(fileList[4]))
    for k in range(len(fileList)):
        if (len(fileList[k]))>0:
            #" ".join(fileList[k])
            removeList.append(fileList[k])

Answer 1

我认为这是最简单的方法：

import re
str = "Albuquerque is my     turkey and he's   feathered and    he's      fine, And    he"
print re.sub(r' +', ' ', str)

输出：

Albuquerque is my turkkey and he's feathered and he's fine, And he

如何删除字符串中的额外空格

1 个答案: