我正在尝试解决一个需要清除文本(以摆脱所有标点和空格)并将其保存到同一寄存器的问题。
with open("moby_01.txt") as infile, open("moby_01_clean_3.txt", "w") as outfile:
for line in infile:
line.lower
...
cleaned_words = line.split("-")
cleaned_words = "\n".join(cleaned_words)
cleaned_words = line.strip().split()
cleaned_words = "\n".join(cleaned_words)
outfile.write(cleaned_words)
我希望程序的输出是单词列表,因为它们在文本中但在一行中是一个。但是事实证明,在for
循环中,只有最后三行会发生迭代,并且如果标点符号为单词,则输出为列表:
Call
me
Ishmael.
Some
years
ago--never
mind
how
long
precisely--having
...
答案 0 :(得分:3)
您可能想要更改此设置。您在这里再次使用line
。
cleaned_words = line.strip().split()
到
cleaned_words = cleaned_words.strip().split()
答案 1 :(得分:0)
我终于找到了解决这个问题的方法。练习书(快速Python书。第三版。NaomiCeder),Python文档和StackOverflow帮助了我。
with open("moby_01.txt") as infile, open("moby_01_clean.txt","w") as outfile:
for line in infile:
cleaned_line = line.lower()
cleaned_line = cleaned_line.translate(str.maketrans("-", " ", ".,?!;:'\"\n"))
words = cleaned_line.split()
cleaned_words = "\n".join(words)
outfile.write(cleaned_words + "\n")
我将-
中的关键字参数z
的{{1}}符号移到了str.maketrns(x[,y[,z]])
,因为其他一些带有x
的单词仍然串联在文件中。出于同样的原因,我在--
\n