使用XPath从HTML获取多个值

时间:2013-04-12 19:02:05

标签: php xpath

我想从一些HTML中提取多个值,我觉得XPath可能是理想的方法。

我想要做的是循环遍历具有类tr的每个data然后在循环中获取我需要的数据route_number a via内的文本<tr class="data"><th class="route_number"><a href="/routes/west-midlands/B001v/?tab=" title="Dudley - Sedgley - Wolverhampton - Tettenhall Wood"><span class="route_number small_curvy">1</span></a></th> <td class="main_and_via"> <a href="/routes/west-midlands/B001v/?tab=" title="Dudley - Sedgley - Wolverhampton - Tettenhall Wood">Dudley - Sedgley - Wolverhampton - Tettenhall Wood</a> <span class="via"><strong>via</strong> Dudley Road and Tettenhall Road</span> </td> </tr><tr class="data"><th class="route_number"><a href="/routes/west-midlands/B002/?tab=" title="Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole"><span class="route_number small_curvy">2</span></a></th> <td class="main_and_via"> <a href="/routes/west-midlands/B002/?tab=" title="Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole">Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole</a> <span class="via"><strong>via</strong> Yardley Wood Road</span> </td> </tr> 1}}(也在标题中)和tr文本。

HTML如下:

route number

循环遍历每个anchor text,然后对via text,{{1}}和{{1}}理想进行单独查询,还是可以使用单个XPath查询完成?

2 个答案:

答案 0 :(得分:0)

您可以使用XPath的“上下文”支持:

$tr = $xpath->query("//tr[@class='data']");

foreach($tr as $row) {
   $route = $tr->query("//td[contains(@class, 'route_number')]", $row);
   etc...
}

请注意第二个 - &gt; query()调用中的$row。它提供了搜索开始的上下文。 xpath不会搜索整个DOM树,而只会搜索$ row指向的特定分支。

这样做可以保证您找到的.route_number是属于您正在处理的$行的那个,而不是来自树中其他位置的.router_number。

答案 1 :(得分:0)

您可以查询所有那些您希望它们的值如果它们总是存在

(
    (//tr[@class = "data"])
        /*[@class="route_number"]//span
        |//tr[@class = "data"]/*[@class="main_and_via"]/a
        |//tr[@class = "data"]//*[@class="via"]
)/text()

结果:

#0: DOMText (length: 1) "1"
#1: DOMText (length: 50) "Dudley - Sedgley - Wolverhampton - Tettenhall Wood"
#2: DOMText (length: 32) " Dudley Road and Tettenhall Road"
#3: DOMText (length: 1) "2"
#4: DOMText (length: 71) "Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole"
#5: DOMText (length: 18) " Yardley Wood Road"

See it in action.