这是我的HTML文件:
<html>
<head>
<link href='http://wendyandgabe.blogspot.com/favicon.ico' rel='icon' type='image/x-icon'/>
<link href='http://wendyandgabe.blogspot.com/' rel='canonical'/>
<link rel="alternate" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://wendyandgabe.blogspot.com/feeds/posts/default" />
<link rel="alternate" type="application/rss+xml" title="O' Happy Day! - RSS" href="http://wendyandgabe.blogspot.com/feeds/posts/default?alt=rss" />
<link rel="service.post" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://www.blogger.com/feeds/5390468261501503598/posts/default" />
</head>
<body>
</body>
</html>
我想从上面的html文件中提取href的网址type="application/rss+xml"
。这怎么可能?任何人都可以展示一些示例代码吗?
答案 0 :(得分:2)
您可以使用
DomDocument http://php.net/manual/de/class.domdocument.php和
和
DomXPath http://de3.php.net/manual/de/class.domxpath.php
$html = <<<EOF
<html>
<head>
<link href='http://wendyandgabe.blogspot.com/favicon.ico' rel='icon' type='image/x-icon'/>
<link href='http://wendyandgabe.blogspot.com/' rel='canonical'/>
<link rel="alternate" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://wendyandgabe.blogspot.com/feeds/posts/default" />
<link rel="alternate" type="application/rss+xml" title="O' Happy Day! - RSS" href="http://wendyandgabe.blogspot.com/feeds/posts/default?alt=rss" />
<link rel="service.post" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://www.blogger.com/feeds/5390468261501503598/posts/default" />
</head>
<body>
</body>
</html>
EOF;
$xml = new DomDocument;
$xml->loadHTML($html);
//create a xpath instance
$xpath = new DomXpath($xml);
//query for <link type="application/rss+xml"> and use the first found item
$link = $xpath->query('//link[@type="application/rss+xml"]')->item(0);
var_dump($link->getAttribute('href'));
答案 1 :(得分:0)
您可以尝试这个PHP类DOMDocument
答案 2 :(得分:0)
使用PHP Simple HTML DOM Parser,方法如下:
// includes Simple HTML DOM Parser
include "simple_html_dom.php";
$text = '<html>
<head>
<link href="http://wendyandgabe.blogspot.com/favicon.ico" rel="icon" type="image/x-icon"/>
<link href="http://wendyandgabe.blogspot.com/" rel="canonical"/>
<link rel="alternate" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://wendyandgabe.blogspot.com/feeds/posts/default" />
<link rel="alternate" type="application/rss+xml" title="O' Happy Day! - RSS" href="http://wendyandgabe.blogspot.com/feeds/posts/default?alt=rss" />
<link rel="service.post" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://www.blogger.com/feeds/5390468261501503598/posts/default" />
</head>
<body>
</body>
</html>';
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($text);
// Find the link with the appropriate selectors
$link = $html->find('link[type=application/rss+xml]', 0);
// Find succeeded
if ($link) {
$href = $link->href;
echo $href;
}
else
echo "Find function failed !";
// Clear DOM object (needed essentially when using many)
$html->clear();
unset($html);
OUTPUT
======
http://wendyandgabe.blogspot.com/feeds/posts/default?alt=rss