Question

<title>foo</title>
<meta name='description' content='foo' />

$url = 'http://www.google.com';

//CURL
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$site = curl_exec($ch);

//DOM
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($site);

$title=$dom->getElementsByTagName('title');
$description=$dom->getElementsByTagName('meta');

echo $title-> ;//need to access object
echo $ele-> tagDescription; //need access tag description

我有一个页面尝试使用DOMDocument从网址中抓取页面标题，描述，og：图像等。

我不知道如何访问对象;有谁知道如何解决这个问题？

如果有多个元素怎么办？我需要将它们转换为数组吗？

Answer 1

$links = $dom->getElementsByTagName('meta');
foreach($links as $link){
    $name = $link->getAttribute('name');

    if($name == 'description'){$description = $link->getAttribute('content');}  
}

Answer 2

您可以使用XPath：

$selector = new DOMXPath($dom);
$node = $selector->query('//meta[@name="description"]/@content')->item(0);
$description = $node->nodeValue;

使用XPath，您可以直接选择<meta name="description" ...>节点，并且不需要在所有<meta>个节点上使用foreach循环

DOMDocument访问标记内容

2 个答案: