Python从小说长串中删除完整的单词句子

时间:2016-10-13 22:06:55

标签: python string

我已将小说粘贴到文本文件中。 我想删除包含以下句子的所有行,因为它们一直出现在每个页面的顶部(只是删除它们在这些行中的出现也会这样做):

  

“热分子运动,有序和概率”

     

“分子和离子相互作用是形成的基础”

     

“界面现象和膜”

我的第一次尝试如下:

mystring = file.read()
mystring=mystring.strip("Molecular Structure of Biological Systems")
mystring=mystring.strip("Thermal Molecular Movement in , Order and Probability")
mystring=mystring.strip("Molecular and Ionic Interactions as the Basis for the Formation")
mystring=mystring.strip("Interfacial Phenomena and Membranes")

new_file=open("no_refs.txt", "w")

new_file.write(mystring)

file.close()

然而,这对输出文本文件没有影响......内容完全没有变化......我觉得这很奇怪,因为下面的玩具示例工作正常:

>>> "Hello this is a sentence. Please read it".strip("Please read it")
'Hello this is a sentence.'

由于上述方法无效,我尝试了以下方法:

file=open("novel.txt", "r")
mystring = file.readlines()
for lines in mystring:
    if "Thermal Molecular Movement in , Order and Probability" in lines:
        mystring.replace(lines, "")
    elif "Molecular and Ionic Interactions as the Basis for the Formation" in lines:
        mystring.replace(lines, "")
    elif "Interfacial Phenomena and Membranes" in lines:
        mystring.replace(lines, "")
    else:
        continue

new_file=open("no_refs.txt", "w")

new_file.write(mystring)
new_file.close()
file.close()

但是对于这次尝试我得到了这个错误:

TypeError:期望字符串或其他字符缓冲区对象

1 个答案:

答案 0 :(得分:2)

  • 首先str.strip()只删除在 start end 的字符串中找到的模式,这解释了它似乎在您的测试中有效,但是事实上并不是你想要的。
  • 其次,您尝试在不在当前行上的列表中执行替换(并且您不会分配替换结果)

这是一个成功删除线条模式的固定版本:

with open("novel.txt", "r") as file:
    mystring = file.readlines()
    for i,line in enumerate(mystring):
        for pattern in ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"]:
            if pattern in line:
                mystring[i] = line.replace(pattern,"")                    

    # print the processed lines
    print("".join(mystring))

注意enumerate构造,它允许迭代值和&指数。仅对值进行迭代将允许查找模式,但不能在原始列表中修改它们。

还要注意with open构造,它会在您离开块时立即关闭文件。

这是一个完全删除包含模式的行的版本(挂起,那里有一些单行函数编程内容):

with open("novel.txt", "r") as file:
    mystring = file.readlines()
    pattern_list = ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"]
     mystring = "".join(filter(lambda line:all(pattern not in line for pattern in pattern_list),mystring))
    # print the processed lines
    print(mystring)

解释:根据条件过滤行列表:没有不需要的模式必须在行中。