Question

有人请告诉我如何使用preg_match_all捕获目标网页在同一网站上的列表链接吗？我试图在搜索结果中捕获的所有链接都是这样的：

<a href="/">Home</a>
<a href="/about-us">About Us</a>
<a href="/contact-us">Contact Us</a>

我不希望在结果中包含的链接示例：

<a href="http://www.facebook.com">Visit Us On Facebook</a>
<a href="https://www.paypal.com">Pay Now</a>

我花了一个小时在网上搜索，并且只找到了显示网页中所有链接的示例，而不是同一个网站所独有的。

谢谢。

Answer 1

以下是使用DOM ...

的解决方案

$dom = DOMDocument::loadHTML('
    <a href="/">Home</a>
    <a href="/about-us">About Us</a>
    <a href="/contact-us">Contact Us</a>
    <a href="http://www.facebook.com">Visit Us On Facebook</a>
    <a href="https://www.paypal.com">Pay Now</a>
');

$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a[substring(@href, 1, 1) = "/"]');

foreach ($nodes as $node) {
   $links[] = $node->getAttribute('href');
}
print_r($links);

Code Demo

您也可以使用带有DOM的preg_match()函数。

$xpath = new DOMXPath($dom);

$xpath->registerNamespace('php', 'http://php.net/xpath');
$xpath->registerPHPFunctions('preg_match');

$nodes = $xpath->evaluate("//a[php:functionString('preg_match', '~^/~', @href)=1]");

foreach ($nodes as $node) {
   $links[] = $node->getAttribute('href');
}
print_r($links);

Code Demo

Answer 2

您可以尝试使用以下正则表达式匹配所有锚点标记，其中href属性的内容以/符号开头。

<a href="(\/[^"]*)">[^<>]*<\/a>

DEMO

<强>代码：

<?php
$string = <<<EOT
<a href="/">Home</a>
<a href="/about-us">About Us</a>
<a href="/contact-us">Contact Us</a>
<a href="http://www.facebook.com">Visit Us On Facebook</a>
<a href="https://www.paypal.com">Pay Now</a>
EOT;
echo preg_match_all('~<a href="(\/[^"]*)">[^<>]*<\/a>~', $string, $matches);
print_r($matches[0]);
print_r($matches[1]);
?>

<强>输出：

3Array
(
    [0] => <a href="/">Home</a>
    [1] => <a href="/about-us">About Us</a>
    [2] => <a href="/contact-us">Contact Us</a>
)
Array
(
    [0] => /
    [1] => /about-us
    [2] => /contact-us
)

preg_match_all用于网站内的链接

2 个答案: