我最近开始学习Python。我正在尝试一段代码来进行一些简单的文本编辑。该程序假设采用UTF-8编码的txt文件,确保从第二行开始缩进1个空格并删除任何可能的双或三空格。
我的计划是,从txt文件中读取信息并将其存储在列表中。然后我将处理列表中的元素,然后最终将它们重写回文件(尚未实现)。我认为自动缩进代码的第一部分正在工作。
但是对于检测和删除不需要的空格的代码,我尝试了函数方法,我认为它正在工作;但是,当我在正文代码中测试列表内容时,内容似乎没有改变(原始状态)。我怎么可能做错了?
为了了解一个示例文件,我将发布我正在尝试处理的部分txt文件
原件:
There are various kinds of problems concerning human rights. Every day we hear news reporting on human rights violation. Human rights NGOs (For example, Amnesty International or Human Rights Watch) have been trying to deal with and resolve these problems in order to restore the human rights of individuals.
预期:
There are various kinds of problems concerning human rights. Every day we hear news reporting on human rights violation. Human rights NGOs (For example, Amnesty International or Human Rights Watch) have been trying to deal with and resolve these problems in order to restore the human rights of individuals.
我的代码如下
import os
os.getcwd()
os.chdir('D:')
os.chdir('/Documents/2011_data/TUFS_08_2011')
words = []
def indent(string):
for x in range(0, len(string)):
if x>0:
if string[x]!= "\n":
if string[x][0] != " ":
y = " " + string[x]
def delete(self):
for x in self:
x = x.replace(" ", " ")
x = x.replace(" ", " ")
x = x.replace(" ", " ")
print(x, end='')
return self
with open('dummy.txt', encoding='utf_8') as file:
for line in file:
words.append(line)
file.close()
indent(words)
words = delete(words)
for x in words:
print(x, end='')
答案 0 :(得分:0)
你的删除函数遍历一个列表,将每个字符串分配给x,然后连续地用各种替换的结果重新分配x。但它从未将结果放回列表中,而是返回不变。
最简单的方法是建立一个包含修改结果的新列表,然后返回。
def delete(words):
result = []
for x in words:
... modify...
result.append(x)
return result
(请注意,使用名称&#39; self&#39;并不是一个好主意,因为这意味着您使用的是对象方法,而您并非如此。)< / p>
答案 1 :(得分:0)
您可以使用split()
和join
;
In [1]: txt = ' This is a text with multiple spaces. '
使用字符串的split()
方法可以得到没有空格的单词列表。
In [3]: txt.split()
Out[3]: ['This', 'is', 'a', 'text', 'with', 'multiple', 'spaces.']
然后您可以将join
方法用于单个空格;
In [4]: ' '.join(txt.split())
Out[4]: 'This is a text with multiple spaces.'
如果您想要前面有额外的空格,请在列表中插入一个空字符串;
In [7]: s = txt.split()
In [8]: s
Out[8]: ['This', 'is', 'a', 'text', 'with', 'multiple', 'spaces.']
In [9]: s.insert(0, '')
In [10]: s
Out[10]: ['', 'This', 'is', 'a', 'text', 'with', 'multiple', 'spaces.']
In [11]: ' '.join(s)
Out[11]: ' This is a text with multiple spaces.'