Question

我需要匹配html标签之间的所有内容，或者如果有其他方式，请从标签之间获取所有信息。

以下是数据样本：

<B>stuff here</B>

<B>Changes in the taxicab and <FONT STYLE="white-space:nowrap">for-
hire</FONT>  vehicle industries have resulted in increased competition and  
have had a material adverse effect on our business, financial condition, and 
operations.  </B>


medallions. </P> <P STYLE="margin-top:12pt; margin-bottom:0pt; text-indent:4%; font-size:10pt; font-family:Times New Roman"><B>We borrow money, which magnifies the potential for gain or loss on amounts invested, and may increase the risk of investing in us. </B></P>

这些是我需要从这个小块获得的匹配：

<B>stuff here</B>

<B>Changes in the taxicab and <FONT STYLE="white-space:nowrap">for-
hire</FONT>  vehicle industries have resulted in increased competition and  
have had a material adverse effect on our business, financial condition, and 
operations.  </B>

<B>We borrow money, which magnifies the potential for gain or loss on amounts invested, and may increase the risk of investing in us. </B>

以下是我尝试的几个正则表达式，两者都没有达到我希望它工作的程度：

re.compile("<[Bb]>[\!\@\#\$\%\^\&\*\(\)\_\+\-\=\,\.\/\<\?\:\"\;\'\{\}\[\]\|\\\w\d\s]*<\/[Bb]>", re.MULTILINE)
re.compile("<[Bb]>.+<\/[Bb]>", re.MULTILINE)

或者，如果没有正则表达式，还有更好的方法吗？

我目前正在将HTML内容加载到文本文件中以删除缩进

Answer 1

您可以使用以下模式匹配<B>代码之间的所有内容：

 (?s)(?<=<B>).*(?=<\/B>)

这使用正向前看（(?<=<B>)）和正面看法（(?=<\/B>)）来匹配标签之间的任何内容。

匹配HTML标记之间的所有内容

1 个答案: