Php在字符串中查找具有给定模式的所有链接

时间:2017-09-29 09:49:25

标签: php regex

我从远程API端点导入文章,导入时我需要通过单词查找字符串中的所有链接。所以,例如,如果我得到一个如下所示的字符串:

<a href='http://myhost.com/Se-hva-vi-gjoer'>Les mer </a>blbl blblb blblb blblb<a href='https://myhost.com/Se-hva-vi-gjoer/Positive-women'>Les mer </a>

我必须找到字符串中包含myhost.com的所有链接。我尝试过这种preg_match:

preg_match_all('@(https?://myhost.com)?([^/]+)@i', $string , $linkMatches);

但是,这给了我这种阵列:

array:3 [
  0 => array:8 [
    0 => "<a href='http:"
    1 => "myhost.com"
    2 => "Se-hva-vi-gjoer'>Les mer <"
    3 => "a>blbl blblb blblb blblb<a href='https:"
    4 => "myhost.com"
    5 => "Se-hva-vi-gjoer"
    6 => "Positive-women'>Les mer <"
    7 => "a>"
  ]
  1 => array:8 [
    0 => ""
    1 => ""
    2 => ""
    3 => ""
    4 => ""
    5 => ""
    6 => ""
    7 => ""
  ]
  2 => array:8 [
    0 => "<a href='http:"
    1 => "myhost.com"
    2 => "Se-hva-vi-gjoer'>Les mer <"
    3 => "a>blbl blblb blblb blblb<a href='https:"
    4 => "myhost.com"
    5 => "Se-hva-vi-gjoer"
    6 => "Positive-women'>Les mer <"
    7 => "a>"
  ]
]

我想要的是一个带有这个字符串的数组:

http://myhost.com/Se-hva-vi-gjoerhttps://myhost.com/Se-hva-vi-gjoer/Positive-women

正确的正则表达式是什么?

2 个答案:

答案 0 :(得分:0)

我会按如下方式解决:

  1. 解释HTML而不是将其解释为STRING (例如:http://simplehtmldom.sourceforge.net/
  2. 然后您可以使用此声明
  3. 代码示例:

    // your html to check get by URL? if not then you use "str_get_html"
    $html = file_get_html('http://www.google.com/');
    enter code here// Find all images 
    foreach($html->find('img') as $element)
       // the $checkURL is your string to compare 
       if ( $element->href === $checkURL ) {
         return $element->href;
       } 
    }
    

答案 1 :(得分:0)

你可以试试这个:

preg_match_all('/[\'\"](https?\:\/\/[^\'\"]?myhost.com[^\'\"]*)[\'\"]/i', $string , $linkMatches);