Question

我有像

这样的链接

<a href="#" class="social google">Google</a>
<a href="#" class="social yahoo">Yahoo</a>
<a href="#" class="social facebook">Facebook</a>

现在我想使用正则表达式match only anchor text 我的意思是它应该只匹配第一个链接中的Text Google。

我试过这段代码。

(?<=<a href="#" class="social .+?">).+?(?=</a>)

但它没有按预期工作。

有人能给我正确的语法吗？

Answer 1

我建议使用捕获组来获取您想要的部分，而不是使用后视和前瞻来排除您不想要的部分：

<a href="#" class="social .+?">(.+?)</a>

从概念上讲，环视用于重叠匹配。这似乎不需要他们的功能。

（当然，usual caveats适用）

更新：这不仅是最佳做法的问题。使用look-behind的正则表达式实际上会产生不正确的结果，因为它允许后视部分与其他匹配重叠。考虑一下这个输入：

<a href="#" class="social google">Google</a>

...

<a class="bad">foo</a>

你的正则表达式不仅匹配“谷歌”;它也将匹配“foo”，因为应该只匹配类字符串的一部分的.+?可以一直扩展到文本中的另一个链接。

Answer 2

试试这个

  "~<a(>| .*?>)(.*?)</a>~si"

或

   "/<a(>| .*?>)(.*?)</a>/"

php示例

  $notecomments ='<a id="234" class="asf">fdgsd</a> <a>fdgsd</a>';

  $output=preg_replace_callback(array("~<a(>| .*?>)(.*?)</a>~si"),function($matches){
       print_r($matches[2]);
       return '';
   },' '.$notecomments.' ');

这会给你所有锚文本

并且这只返回class =“social”

  "#<a .*?class=\".*?social.*?\".*?>(.*?)</a>#"

样本

  $notecomments ='<a id="234" class="fas social ads">fdgsd</a> <a>fdgsd</a>';

  $output=preg_replace_callback(array("#<a .*?class=\".*?social.*?\".*?>(.*?)</a>#"),function($matches){

     print_r($matches);
 return '';},' '.$notecomments.' ');

Answer 3

您可能获得了正确的结果，但由于您有其他匹配组（？...），您的匹配项也包含您不想要的数据。

您可以尝试使用不匹配的群组（？：...）并将您想要在群组中显示的内容放在群组内（。+？）

Answer 4

试试这个正则表达式：

\<a .*?\>(.*?)\<\/a\>

编辑1 - 此正则表达式匹配具有css类“社交”的锚点：

\<a .*?class=".*?\bsocial\b.*?\>(.*?)\<\/a\>

正则表达式Lookbehind断言 - 匹配链接锚文本

4 个答案: