如何清理HTML,仅保留<a> <b> <i> <p> tags?

时间:2019-01-16 15:20:35

标签: html regex text

I have to process a very large amount of HTML text for epub conversion, and every "automated" solution I found and tried is way less than satisfactory.

So I was thinking toward a regex batch command solution, but I am too regex illiterate to make it work, especially considering possible nesting instances. Can anybody help or point me to a surefire solution?

Thanks in advance!

1 个答案:

答案 0 :(得分:0)

最好的解决方案是使用HTML解析器。 对于简单的情况,您可以尝试以下正则表达式:<[abip]>[^<>]*<\/[abip]>|<[abip][^<>]*\/>