从Meta描述获取SimpleXMLElement

时间:2014-07-13 12:41:33

标签: php html xml xpath simplexml

我正在尝试检索包含在SimpleXMLElement中的一些元数据。我正在使用XPATH,我很难获得我感兴趣的价值。

以下是网页标题的摘录(来自:http://www.wayfair.de/CleverFurn-Couchtisch-Abby-69318X2-MFE2223.html

您知道如何检索包含以下内容的数组中的所有xmlns数据:

1)og:类型 2)og:url 3)og:图像 .... x)og:upc


<meta xmlns:og="http://opengraphprotocol.org/schema/" property="og:title" content="CleverFurn Couchtisch &quot;Abby&quot;" />


这是我的PHP代码

<?php
$html = file_get_contents("http://www.wayfair.de/CleverFurn-Couchtisch-Abby-69318X2-MFE2223.html");
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->recover=true;
@$doc->loadHTML("<html><body>".$html."</body></html>");

$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*/meta[@property='og:url']");

if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>[". $element->nodeName. "]";
var_dump($element);
  $nodes = $element->childNodes;
  foreach ($nodes as $node) {
     echo $node->nodeValue. "\n";
     }
   }
 }
?>

1 个答案:

答案 0 :(得分:1)

刚刚找到答案:

How to get Open Graph Protocol of a webpage by php?

<?php
$html = file_get_contents("http://www.wayfair.de/CleverFurn-Couchtisch-Abby-69318X2-MFE2223.html");
libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$metas = $xpath->query($query);
foreach ($metas as $meta) {
    $property = $meta->getAttribute('property');
    $content = $meta->getAttribute('content');
    $rmetas[$property] = $content;
}
var_dump($rmetas);
?>