<body class="en-us"> <div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
</a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level"><strong>85</strong></span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
<ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li><a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np"><span class="arrow"><span class=
"icon">Character Summary</span></span></a></li>
<li class="root-menu"><a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np"><span class=
"arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=" active"><a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np"><span class="arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=""><a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np"><span class="arrow"><span class=
"icon">General</span></span></a></li>
我知道我在这里发布了很多无用的代码,但是希望你们知道DOM会是什么样子。
由此:
<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>
我想提取一下:
/wow/en/character/some-server/sometoon/achievement#92
来自发布标记中的最后一个锚点。
我已经阅读了尽可能多地了解如何使用xpath查询来提取所需信息,但我显然遗漏了一些东西。以下是我认为应该有效的查询,但不是。
<?php
$query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
echo $query . "<br>";
$achievementSubCategory = $xpath->query($query);
$achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
var_dump($achiSubArray);
// Produces array(1) { ["URL"]=> NULL } which should look something more like:
// array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>
提前感谢您的帮助和建议
答案 0 :(得分:1)
*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href
此XPath表达式存在一些问题:
它正在寻找一个ul
元素,它是当前节点的一个crandchild,并且有一个名为class
的属性,其字符串值等于其中一个的字符串值子元素ul
的元素,名为profile-sidebar-menu
。但是,ul
没有名为profile-sidebar-menu
的子项,整个表达式不会选择任何节点。
另一个问题是索引。 li[3]
选择第三个li
元素 - 上下文节点的子元素。但是,有用的a
元素是上下文节点的第四个 li
子元素的子元素。这必须表示为:li[4]
。 XPath位置是从1开始的,而不是从0开始的。
如果纠正了这两个问题,我相信更正的表达式应如下所示:
*/ul[@class="profile-sidebar-menu"]/ul/li[4]/a/@href
从提供的XML文档的顶部元素href
开始,选择所需body
属性的绝对XPath表达式是:
/*/*/*/*/*/*/*/*/*/*/ul/li[4]/a/@href
下面是XML文档(提供的文档,通过附加一些缺少的结束标记而形成良好的文档:
<body class="en-us">
<div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style=""></a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level">
<strong>85</strong>
</span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
<ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li>
<a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np">
<span class="arrow">
<span class=
"icon">Character Summary</span></span>
</a>
</li>
<li class="root-menu">
<a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np">
<span class=
"arrow">
<span class=
"icon">Achievements</span></span>
</a>
</li>
<li class=" active">
<a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np">
<span class="arrow">
<span class=
"icon">Achievements</span></span>
</a>
</li>
<li class="">
<a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np">
<span class="arrow">
<span class=
"icon">General</span></span>
</a>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
通过使用the Xpath Visualizer等工具对其进行评估,可以检查上述绝对XPath表达式是否精确选择了所需的href
属性。
以下是使用XPath Visualizer执行的选择快照:
答案 1 :(得分:0)
如果您的DOM结构是一致的,那么类似下面的内容应该有效:
//ul[@class='profile-sidebar-menu']/li[last()]/a/@href
你的xpath语句毫无意义。路径中有多个ul,但样本的结构不是这样的。此外,xpath中的索引从1开始,而不是0。
答案 2 :(得分:0)
在你上面显示的html的基础上(并假设最终的标签正确关闭),ewh'表达应该可以正常工作。
可能是你在那里省略了文件的一些重要部分。尝试更具体:
//ul[@class='profile-sidebar-menu' and @id='profile-sidebar-menu']/li/a[@href='/wow/en/character/some-server/sometoon/achievement#92']/@href
我很确定它有效,通过XPath Query Expression Tool在线测试。
如果仍然没有得到结果,请尝试显示您正在处理的所有HTML。