Question

我有一个html文件，其中数据包含超过300行。我想删除特定行下面的所有数据。例如，我想删除以下行下面的所有数据。怎么样？

<pre>
Page 5

如果可能，请保留结束标记，这是html的最后一行。

<hr></body></html>

我写了以下代码。但它只删除了特定的（第5页）行。我想删除下面的所有行＆＃34; Page 3＆＃34;。怎么样？

f = open("4105.html","r")
lines = f.readlines()
f.close()
f = open("4105-modified.html","w")
for line in lines:
  if line!='''Page 5'''+"\n":
    f.write(line)

Answer 1

找到Page 5后停止写行：

with open('4105.html') as inf, open('4105-modified.html','w') as outf:
    for line in inf:
        outf.write(line)
        if line == 'Page 5\n':
            break

    # if you want the last tags to remain
    outf.write('<hr></body></html>')

我会考虑使用像BeautifulSoup这样的HTML解析器。

修改每条评论（未经测试）

with open('4105.html') as inf, open('4105-modified.html','w') as outf:
    lines = inf.readlines()
    idx = lines.index('Page 5\n')
    if idx != -1: # found it
        del lines[idx - 1] # delete line before
        del lines[idx:-1]  # delete all lines except last to keep trailing tags.
    outf.write(''.join(lines))

如何通过python3删除html特定行下面的所有数据

1 个答案: