Question

我想提取href属性，但是该属性尤其具有mailto功能。而且我不仅想对一个链接进行此操作，而且所有链接都属于主网页。

我尝试过：

<?php

$url = "https://www.omurcanozcan.com";

$html = file_get_contents( $url);

libxml_use_internal_errors( true);
$doc = new DOMDocument;
$doc->loadHTML( $html);
$xpath = new DOMXpath( $doc);
$node = $xpath->query( "//a[@href='mailto:']")->item(0);


echo $node->textContent; // This will print **GET THIS TEXT**

 ?>

例如，我希望代码是

<a href='mailto:omurcan@omurcanozcan.com'>omurcan@omurcanozcan.com</a>

我想回声

<p>omurcan@omurcanozcan.com</p>

Answer 1

主要问题是在XPath中，您正在检查

//a[@href='mailto:']

这将查找仅包含mailto:的href属性，您想要的是href以mailto:开头的位置，您可以使用starts-with() ...

$node = $xpath->query( "//a[starts-with(@href,'mailto:')]")->item(0);

第二件事是，当您获取内容时，我认为您的页面并未完全加载，我的常见测试是在加载HTML后保存该HTML，以便我可以先将其签出...

$url = "https://www.omurcanozcan.com";

$html = file_get_contents( $url);
file_put_contents("a.html", $html);

如果您随后查看a.html，则可以看到其正在使用的HTML，但在内容中我看不到任何mailto:链接。

PHP从网站但从所有页面获得特定信息

1 个答案: