我已将小说粘贴到文本文件中。 我想删除包含以下句子的所有行,因为它们一直出现在每个页面的顶部(只是删除它们在这些行中的出现也会这样做):
“热分子运动,有序和概率”
“分子和离子相互作用是形成的基础”
“界面现象和膜”
我的第一次尝试如下:
mystring = file.read()
mystring=mystring.strip("Molecular Structure of Biological Systems")
mystring=mystring.strip("Thermal Molecular Movement in , Order and Probability")
mystring=mystring.strip("Molecular and Ionic Interactions as the Basis for the Formation")
mystring=mystring.strip("Interfacial Phenomena and Membranes")
new_file=open("no_refs.txt", "w")
new_file.write(mystring)
file.close()
然而,这对输出文本文件没有影响......内容完全没有变化......我觉得这很奇怪,因为下面的玩具示例工作正常:
>>> "Hello this is a sentence. Please read it".strip("Please read it")
'Hello this is a sentence.'
由于上述方法无效,我尝试了以下方法:
file=open("novel.txt", "r")
mystring = file.readlines()
for lines in mystring:
if "Thermal Molecular Movement in , Order and Probability" in lines:
mystring.replace(lines, "")
elif "Molecular and Ionic Interactions as the Basis for the Formation" in lines:
mystring.replace(lines, "")
elif "Interfacial Phenomena and Membranes" in lines:
mystring.replace(lines, "")
else:
continue
new_file=open("no_refs.txt", "w")
new_file.write(mystring)
new_file.close()
file.close()
但是对于这次尝试我得到了这个错误:
TypeError:期望字符串或其他字符缓冲区对象
答案 0 :(得分:2)
str.strip()
只删除在 start 或 end 的字符串中找到的模式,这解释了它似乎在您的测试中有效,但是事实上并不是你想要的。这是一个成功删除线条模式的固定版本:
with open("novel.txt", "r") as file:
mystring = file.readlines()
for i,line in enumerate(mystring):
for pattern in ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"]:
if pattern in line:
mystring[i] = line.replace(pattern,"")
# print the processed lines
print("".join(mystring))
注意enumerate
构造,它允许迭代值和&指数。仅对值进行迭代将允许查找模式,但不能在原始列表中修改它们。
还要注意with open
构造,它会在您离开块时立即关闭文件。
这是一个完全删除包含模式的行的版本(挂起,那里有一些单行函数编程内容):
with open("novel.txt", "r") as file:
mystring = file.readlines()
pattern_list = ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"]
mystring = "".join(filter(lambda line:all(pattern not in line for pattern in pattern_list),mystring))
# print the processed lines
print(mystring)
解释:根据条件过滤行列表:没有不需要的模式必须在行中。