使用DOMDocument在PHP中刮取特定标记属性

时间:2015-01-21 06:37:02

标签: php web-scraping domdocument

我正在尝试从“元”中提取内容。标签取决于'属性'。喜欢 `

<meta name="keywords" content="9gag,fun,funny,lol,meme,GIF,wtf,omg,fail,video,cosplay,geeky,forever alone" />
<meta name="twitter:image" content="http://images-cdn.9gag.com/images/thumbnail-facebook/14198244_1420182794.8999_AmeJun_n.jpg" />
<meta property="og:title" content="I finished the manga last week, so I wanted to make my on &quot;What Naruto taught me&quot;" />
<meta property="og:site_name" content="9GAG" />
<meta property="og:url" content="http://9gag.com/gag/aGVqbvz" />

... ` 所以我想只得到那些内容有&#39; og&#39;。 通过cURL请求,我能够获得属性。

$ch = curl("http://9gag.com/gag/aGVqbvz?ref=fsidebar");
$dom = new DOMDocument();
@$dom->loadHTML($ch);

//echo $ch;
$links = $dom->getElementsByTagName('meta');
//get no of tags or elements
echo $links->length;
echo '<pre>';
foreach ($links as $link) {
    echo $link->getAttribute("property");
    echo '<br>';
}

如何获取仅特定属性或名称的内容。

1 个答案:

答案 0 :(得分:0)

XPath是你的朋友。像//meta[starts-with(@property, "og")]/@content这样的表达式将获取具有属性属性的所有元素的内容属性,该属性的值以&#34; og&#34;开始。

实施例

$xpath = new DOMXPath($dom);
$query = '//meta[starts-with(@property, "og")]/@content';
foreach ($xpath->query($query) as $node) {
    echo $node->value, "\n";
}

输出:

I finished the manga last week, so I wanted to make my on "What Naruto taught me"
9GAG
http://9gag.com/gag/aGVqbvz