我必须评估一个包含格式差的文章的文本文件,我必须重新格式化。第一步是删除句子中的所有额外空格。我决定读入文件,然后将所有行放入一个字符串,然后我将包含句子的行放入它自己的单独列表中。现在我在决定如何删除列表中的额外空格时遇到了麻烦,并且想知道是否可以使用内置方法来删除多余的空格?
以下是我列表中一个句子的示例:
["Albuquerque is my turkey and he's feathered and he's fine, And he"]
和我到目前为止的代码:
def remove_extra_whitespaces():
fileList= []
removeList= []
infile= open("essay1.txt", 'r')
for line in infile:
if (len(line))>0:
fileList.append(line.strip())
else:
fileList.append(line)
print (len(fileList[4]))
for k in range(len(fileList)):
if (len(fileList[k]))>0:
#" ".join(fileList[k])
removeList.append(fileList[k])
答案 0 :(得分:1)
我认为这是最简单的方法:
import re
str = "Albuquerque is my turkey and he's feathered and he's fine, And he"
print re.sub(r' +', ' ', str)
输出:
Albuquerque is my turkkey and he's feathered and he's fine, And he