我有一个html文件,其中数据包含超过300行。我想删除特定行下面的所有数据。例如,我想删除以下行下面的所有数据。怎么样?
<pre>
Page 5
如果可能,请保留结束标记,这是html的最后一行。
<hr></body></html>
我写了以下代码。但它只删除了特定的(第5页)行。我想删除下面的所有行&#34; Page 3&#34;。怎么样?
f = open("4105.html","r")
lines = f.readlines()
f.close()
f = open("4105-modified.html","w")
for line in lines:
if line!='''Page 5'''+"\n":
f.write(line)
答案 0 :(得分:2)
找到Page 5
后停止写行:
with open('4105.html') as inf, open('4105-modified.html','w') as outf:
for line in inf:
outf.write(line)
if line == 'Page 5\n':
break
# if you want the last tags to remain
outf.write('<hr></body></html>')
我会考虑使用像BeautifulSoup这样的HTML解析器。
修改每条评论(未经测试)
with open('4105.html') as inf, open('4105-modified.html','w') as outf:
lines = inf.readlines()
idx = lines.index('Page 5\n')
if idx != -1: # found it
del lines[idx - 1] # delete line before
del lines[idx:-1] # delete all lines except last to keep trailing tags.
outf.write(''.join(lines))