我从远程API端点导入文章,导入时我需要通过单词查找字符串中的所有链接。所以,例如,如果我得到一个如下所示的字符串:
<a href='http://myhost.com/Se-hva-vi-gjoer'>Les mer </a>blbl blblb blblb blblb<a href='https://myhost.com/Se-hva-vi-gjoer/Positive-women'>Les mer </a>
我必须找到字符串中包含myhost.com
的所有链接。我尝试过这种preg_match:
preg_match_all('@(https?://myhost.com)?([^/]+)@i', $string , $linkMatches);
但是,这给了我这种阵列:
array:3 [
0 => array:8 [
0 => "<a href='http:"
1 => "myhost.com"
2 => "Se-hva-vi-gjoer'>Les mer <"
3 => "a>blbl blblb blblb blblb<a href='https:"
4 => "myhost.com"
5 => "Se-hva-vi-gjoer"
6 => "Positive-women'>Les mer <"
7 => "a>"
]
1 => array:8 [
0 => ""
1 => ""
2 => ""
3 => ""
4 => ""
5 => ""
6 => ""
7 => ""
]
2 => array:8 [
0 => "<a href='http:"
1 => "myhost.com"
2 => "Se-hva-vi-gjoer'>Les mer <"
3 => "a>blbl blblb blblb blblb<a href='https:"
4 => "myhost.com"
5 => "Se-hva-vi-gjoer"
6 => "Positive-women'>Les mer <"
7 => "a>"
]
]
我想要的是一个带有这个字符串的数组:
http://myhost.com/Se-hva-vi-gjoer
和https://myhost.com/Se-hva-vi-gjoer/Positive-women
正确的正则表达式是什么?
答案 0 :(得分:0)
我会按如下方式解决:
代码示例:
// your html to check get by URL? if not then you use "str_get_html"
$html = file_get_html('http://www.google.com/');
enter code here// Find all images
foreach($html->find('img') as $element)
// the $checkURL is your string to compare
if ( $element->href === $checkURL ) {
return $element->href;
}
}
答案 1 :(得分:0)
你可以试试这个:
preg_match_all('/[\'\"](https?\:\/\/[^\'\"]?myhost.com[^\'\"]*)[\'\"]/i', $string , $linkMatches);