我尝试将下面的结构解析成一组旅程选项,这样我就可以找到从Pontypridd到Llangollen并返回的所有可能方法。
使用XPath,我可以//div[@class='JourneyOptions']
选择实际包含旅程信息的所有行。在XPath之外,我可以迭代每一行来决定是否应该将它添加到一组旅程中,或者它是否是新一组旅程中的第一个。
在下面的示例中,所有旅程集将包含两个旅程,但一组可能只包含一个旅程(“直接”旅程)或两个以上(多个“连接”)。
是否有XPath表达式可以选择第一个出站集的所有行程,第二个出站集的所有行程等等?
每组中的第一个旅程都有一个带整数值的无线电输入。我可以动态生成这些以获取每个集合,但是需要知道何时停止生成(或者只是等待XPath失败)。
<div class='TableHolder'>
<p>...</p>
<h2 id='DirectionHeader'>Outbound Options</h2>
<p>Pontypridd to Llangollen, 30/11/1910</p>
<!-- first part of the first journey from Pontypridd to Llangollen -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ColumnOne'>
<input type='radio' checked='checked' name='out' value='1'>
</div>
... some more divs of parseable journey info ...
</div>
<div>
<!-- second part of the first journey from Pontypridd to Llangollen -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ConnectingJournies'>
<p>...</p>
</div>
<div class='ColumnOne'>
... doesn't contain a radio input ...
</div>
... some more divs of parseable journey info ...
</div>
</div>
<!-- first part of the second journey from Pontypridd to Llangollen -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ColumnOne'>
<input type='radio' name='out' value='2'>
</div>
... some more divs of parseable journey info ...
</div>
<div>
<!-- second part of the second journey from Pontypridd to Llangollen -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ConnectingJournies'>
<p>...</p>
</div>
<div class='ColumnOne'>
... doesn't contain a radio input ...
</div>
... some more divs of parseable journey info ...
</div>
</div>
... some more outbound journey options ...
<p>...</p>
<h2 id='DirectionHeader'>Inbound Options</h2>
<p>Llangollen to Pontypridd, 07/11/1910</p>
<!-- first part of the first journey from Llangollen to Pontypridd -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ColumnOne'>
<input type='radio' checked='checked' name='in' value='1'>
</div>
... some more divs of parseable journey info ...
</div>
<div>
<!-- second part of the first journey from Llangollen to Pontypridd -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ConnectingJournies'>
<p>...</p>
</div>
<div class='ColumnOne'>
... doesn't contain a radio input ...
</div>
... some more divs of parseable journey info ...
</div>
</div>
<!-- first part of the second journey from Llangollen to Pontypridd -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ColumnOne'>
<input type='radio' name='in' value='2'>
</div>
... some more divs of parseable journey info ...
</div>
<div>
<!-- second part of the second journey from Llangollen to Pontypridd -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ConnectingJournies'>
<p>...</p>
</div>
<div class='ColumnOne'>
... doesn't contain a radio input ...
</div>
... some more divs of parseable journey info ...
</div>
</div>
... some more inbound journey options ...
</div>
很抱歉这个大型的例子,但我认为这个数字虽然我可以做到尽可能小,但仍能代表我的问题。
答案 0 :(得分:0)
节点集只是......好吧,设置:订购的主机语言依赖关系(最多的文档顺序)唯一节点。如果您希望结果表达某种层次结构或分组,答案是您不能。
因此,您可以选择每个组的开头:
/div[@class='TableHolder']
/div[@class='JourneyOptions']
[div[@class='Journey']
/div[@class='ColumnOne']
/input[@type='radio']
]
当时有一组(有很多选项):
/div[@class='TableHolder']
/div[@class='JourneyOptions']
[count(
(self::div|preceding-sibling::div)
[div[@class='Journey']
/div[@class='ColumnOne']
/input[@type='radio']
]
) = 1
]