Question

当前，我正在尝试通过在源上使用正则表达式从网站获取社交媒体渠道。我正在寻找的频道是rss，facebook，twitter，youtube和instagram。正则表达式在大多数情况下都能正常工作，但是我遇到了一个奇怪的问题，即懒惰搜索比必要的匹配更多。

当前我正在使用此正则表达式： (?<=href=").*?twitter\.com.*?(?=")

在大多数页面上，我会得到如下所示的预期结果：https://twitter.com/xxxxx

但是我遇到了一个问题，它以某种方式从第一个href开始匹配，然后选择所有内容，直到通过twitter链接到达href。

<li><a href="https://www.facebook.com/xxxxx" title="Follow us on
Facebook" target="_blank" rel="noopener" 
data-event-category="Top" data-event-action="Social" 
data-event-label="Facebook" data-event-non-interaction="true">
<i class="icon icon--facebook--light"></i></a></li><li><a 
href="https://twitter.com/xxxxx" 
title="Follow us on Twitter" target="_blank" 
rel="noopener" data-event-category="Top" 
data-event-action="Social" 
data-event-label="Twitter" data-event-non-interaction="true">
<i class="icon icon--twitter--light"></i></a>

使用相同的正则表达式匹配项：

https://www.facebook.com/xxxxx" title="Follow us on Facebook" target="_blank" rel="noopener" data-event-category="Top" data-event-action="Social" data-event-label="Facebook" data-event-non-interaction="true"><i class="icon icon--facebook--light"></i></a></li><li><a href="https://twitter.com/xxxxx

编辑： Wiktor Stribiewew将重复的帖子链接后的解决方案。对于任何有兴趣的人来说，这就是解决问题的解决方案的样子 (?<=href\=")(?:(?!href\=).)*?twitter.com.*?(?=")

正则表达式懒惰搜索，向前看，向后看

0 个答案: