Question

我已将小说粘贴到文本文件中。我想删除包含以下句子的所有行，因为它们一直出现在每个页面的顶部（只是删除它们在这些行中的出现也会这样做）：

“热分子运动，有序和概率”

“分子和离子相互作用是形成的基础”

“界面现象和膜”

我的第一次尝试如下：

mystring = file.read()
mystring=mystring.strip("Molecular Structure of Biological Systems")
mystring=mystring.strip("Thermal Molecular Movement in , Order and Probability")
mystring=mystring.strip("Molecular and Ionic Interactions as the Basis for the Formation")
mystring=mystring.strip("Interfacial Phenomena and Membranes")

new_file=open("no_refs.txt", "w")

new_file.write(mystring)

file.close()

然而，这对输出文本文件没有影响......内容完全没有变化......我觉得这很奇怪，因为下面的玩具示例工作正常：

>>> "Hello this is a sentence. Please read it".strip("Please read it")
'Hello this is a sentence.'

由于上述方法无效，我尝试了以下方法：

file=open("novel.txt", "r")
mystring = file.readlines()
for lines in mystring:
    if "Thermal Molecular Movement in , Order and Probability" in lines:
        mystring.replace(lines, "")
    elif "Molecular and Ionic Interactions as the Basis for the Formation" in lines:
        mystring.replace(lines, "")
    elif "Interfacial Phenomena and Membranes" in lines:
        mystring.replace(lines, "")
    else:
        continue

new_file=open("no_refs.txt", "w")

new_file.write(mystring)
new_file.close()
file.close()

但是对于这次尝试我得到了这个错误：

TypeError：期望字符串或其他字符缓冲区对象

Answer 1

首先str.strip()只删除在 start 或 end 的字符串中找到的模式，这解释了它似乎在您的测试中有效，但是事实上并不是你想要的。
其次，您尝试在不在当前行上的列表中执行替换（并且您不会分配替换结果）

这是一个成功删除线条模式的固定版本：

with open("novel.txt", "r") as file:
    mystring = file.readlines()
    for i,line in enumerate(mystring):
        for pattern in ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"]:
            if pattern in line:
                mystring[i] = line.replace(pattern,"")                    

    # print the processed lines
    print("".join(mystring))

注意enumerate构造，它允许迭代值和＆amp;指数。仅对值进行迭代将允许查找模式，但不能在原始列表中修改它们。

还要注意with open构造，它会在您离开块时立即关闭文件。

这是一个完全删除包含模式的行的版本（挂起，那里有一些单行函数编程内容）：

with open("novel.txt", "r") as file:
    mystring = file.readlines()
    pattern_list = ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"]
     mystring = "".join(filter(lambda line:all(pattern not in line for pattern in pattern_list),mystring))
    # print the processed lines
    print(mystring)

解释：根据条件过滤行列表：没有不需要的模式必须在行中。

Python从小说长串中删除完整的单词句子

1 个答案: