Question

我真的不知道为什么我不能从一个带有preg_match的网站的某些源代码中获取一些网址，也许是我做错了，我尝试了很多但是我无法得到它......

问题是，我试图仅从源代码中获取看起来像这样的网址：

<h2><a href="http://www.website.com/index.php" h="ID=SERP,5085.1">Website name</a></h2>

所以我想要的变量是http://www.website.com/index.php

我做的是这样的事情：

preg_match_all('/<h2><a href=".*">/',$text,$m) ;

$ text是源代码，它的网站源代码很长，所以我只想从标签中获取href＆lt; a＆gt;里面的标签＆lt; h2＆gt; 。。。我希望你们能帮助我

Answer 1

你在这里要求一个正则表达式，但它不是解析HTML的正确工具。请使用DOM：

$html = <<<DATA
<h2><a href="http://www.website.com/index.php" h="ID=SERP,5085.1">Website name</a></h2>
<h2><a href="http://www.example.com">Example site</a></h2>
<h1><a href="http://www.bar.com">Bar</a></h1>
<a href="http://www.foo.com">foo</a>
DATA;

$dom = new DOMDocument;
$dom->loadHTML($html); // Load your HTML data..

$xpath = new DOMXPath($dom);

foreach ($xpath->query("//h2/a") as $tag) {
   $links[] = $tag->getAttribute('href');
}

print_r($links);

输出

Array
(
    [0] => http://www.website.com/index.php
    [1] => http://www.example.com
)

Answer 2

试试这个：

<?php
$string = '<h2><a href="http://www.website.com/index.php" h="ID=SERP,5085.1">Website name</a></h2>';
$url = preg_replace('#.*href="([^\"]+)".*#', '\1', $string);
print_r($url);
?>

输出：

http://www.website.com/index.php

使用preg_match_all获取网址（简单）

2 个答案: