Question

我正在构建一个用于测试目的的抓取工具（不，我不抓取网站新闻），我想从源头消除所有无用的信息，例如在其底部添加链接和文字。在这些数据之前，总是有一个静态段落，因此我想将此段落匹配并删除它以及它后面的所有内容。

静态段落始终为：<p><strong>Static text here</strong></p>

我想使用正则表达式捕获的示例完整文本为：

<p><strong>Static text here</strong></p>
<p><strong>This is a paragraph</strong></p>
<p>This is another, a normal weight one</p>
<p><img src="test.png">Here's an image</p>

另一个例子可能是：

<p><strong>Static text here</strong></p>
<p><img src="test.png">Here's an image</p>
<p>another example</p>

有什么想法吗？我生成了此正则表达式，但它仅匹配第一行，而我想匹配静态文本之后的所有可能的行： https://regex101.com/r/xS873u/2

REGEX：在特定文字之后匹配无限制的行

0 个答案: