代码
<a href="http://www.example.com/5809/book>Origin of Species</a>
<a href="http://www.example.com/author/id=124>Darwin</a>
<a href="http://www.example.com/196/genres>Science, Biology</a>
<span class="Xbkznofv">24/11/1859</span>
如何使用标签上的href使用xpath查询获取id号?
我想要这个例子的结果:
5809,124,196,24 / 11/1859
Php代码
$url = 'http://www.example.com/Books/Default.aspx';
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$elements1 = $xpath->query('//a[contains(@href, "www.example.com/Book/")]');
$elements2 = $xpath->query('//a[contains(@href, "www.example.com/author/id=")]');
$elements3 = $xpath->query('//a[contains(@href, "www.example.com/genres/")]');
$elements4 = $xpath->query('//span[contains(@class, "")]');
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>". "";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
答案 0 :(得分:0)
Xpath 1.0有一些有限的字符串操作,但在某些时候,只需读取属性并使用正则表达式提取值就会容易得多。
但是这里只是一个使用Xpath的例子:
$html = <<<'HTML'
<a href="http://www.example.com/5809/book">Origin of Species</a>
<a href="http://www.example.com/author/id=124">Darwin</a>
<a href="http://www.example.com/196/genres">Science, Biology</a>
<span class="Xbkznofv">24/11/1859</span>
HTML;
$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);
$data = [
'book_title' => $xpath->evaluate(
'string(//a[contains(@href, "www.example.com") and contains(@href, "/book")])'
),
'book_id' => $xpath->evaluate(
'substring-before(
substring-after(
//a[contains(@href, "www.example.com") and contains(@href, "/book")]/@href,
"www.example.com/"
),
"/"
)'
),
'author_id' => $xpath->evaluate(
'substring-after(
//a[contains(@href, "www.example.com/author/id=")]/@href,
"/id="
)'
)
];
var_dump($data);
输出:
array(3) {
["book_title"]=>
string(17) "Origin of Species"
["book_id"]=>
string(4) "5809"
["author_id"]=>
string(3) "124"
}
这些表达式仅适用于DOMXpath::evaluate()
,DOMXpath::query()
只能返回节点列表。
大多数情况下,您将使用一个表达式来获取节点列表,迭代它们并使用多个表达式来获取值。这是一个简化的例子:
$html = <<<'HTML'
<div class="book">
<a href="#1">Origin of Species</a>
</div>
<div class="book">
<a href="#2">On the Shoulders of Giants</a>
</div>
HTML;
$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);
foreach ($xpath->evaluate('//div[@class="book"]') as $book) {
var_dump(
$xpath->evaluate('string(.//a)', $book),
$xpath->evaluate('string(.//a/@href)', $book)
);
}
输出:
string(17) "Origin of Species"
string(2) "#1"
string(26) "On the Shoulders of Giants"
string(2) "#2"