.split()序列最多只能使用一次

时间:2019-05-29 05:27:29

标签: python

我正在尝试解决一个需要清除文本(以摆脱所有标点和空格)并将其保存到同一寄存器的问题。

with open("moby_01.txt") as infile, open("moby_01_clean_3.txt", "w") as outfile:
    for line in infile:
        line.lower
        ...
        cleaned_words = line.split("-")
        cleaned_words = "\n".join(cleaned_words)
        cleaned_words = line.strip().split() 
        cleaned_words = "\n".join(cleaned_words)
        outfile.write(cleaned_words)

我希望程序的输出是单词列表,因为它们在文本中但在一行中是一个。但是事实证明,在for循环中,只有最后三行会发生迭代,并且如果标点符号为单词,则输出为列表:

Call
me
Ishmael.
Some
years
ago--never
mind
how
long
precisely--having
... 

2 个答案:

答案 0 :(得分:3)

您可能想要更改此设置。您在这里再次使用line

cleaned_words = line.strip().split() 

cleaned_words = cleaned_words.strip().split() 

答案 1 :(得分:0)

我终于找到了解决这个问题的方法。练习书(快速Python书。第三版。NaomiCeder),Python文档和StackOverflow帮助了我。

with open("moby_01.txt") as infile, open("moby_01_clean.txt","w") as outfile:
    for line in infile:
        cleaned_line = line.lower()
        cleaned_line = cleaned_line.translate(str.maketrans("-", " ", ".,?!;:'\"\n"))
        words = cleaned_line.split()
        cleaned_words = "\n".join(words)
        outfile.write(cleaned_words + "\n")

我将-中的关键字参数z的{​​{1}}符号移到了str.maketrns(x[,y[,z]]),因为其他一些带有x的单词仍然串联在文件中。出于同样的原因,我在--

中添加了\n