Question

我想解析外部网页，并使用PHP从内容中提取所有网址和链接文本。

例如，

$content="<a href="http://google.com" target="_blank"> google</a> is very good search engine <a href="http://gmail.com" target="_blank">Gmail </a> is provided by google.

输出：

http//google.com      google 
http//gmail.com     Gmail

建议非常感谢！

Answer 1

如果要使用正则表达式提取网址和文本，则以下内容应该有效：

<\s*a\s*href\s*=\"(?<url>.*)\">(?<text>.*)</a>

然而，使用HTML解析RegEx并不是一个好主意，您可以改为使用DOM类。

修改

$content = "< a href="http://google.com" target="_blank"> google</a> is very good search engine < a href="http://gmail.com" target="_blank">Gmail </a> is provided by google ."; $html = new DOMDocument(); $html->loadHTML($content); $anchors = $html->getElementsByTagName('a'); foreach ($anchors as $anchor) { echo $anchor->getAttribute('href') . "\t" . $anchor->nodeValue; }

Answer 2

您可以使用此REGEX模式href="([a-zA-Z0-9://. ]+)"

使用示例

$pattern = 'href="([a-zA-Z0-9://. ]+)"';
$content = file_get_contents(FILE NAME HERE);
preg_match($pattern, $content, $matches);

print_r($matches);

这将列出所有链接。然后你可以解析它们。

解析外部网页并从内容中提取所有URL和链接文本

2 个答案: