Question

如何在以下行中的br标记之后提取文字：

<div id='population'>
    The Snow Leopard Survival Strategy (McCarthy <em>et al.</em> 2003, Table
    II) compiled national snow leopard population estimates, updating the work
    of Fox (1994). Many of the estimates are acknowledged to be rough and out
    of date, but the total estimated population is 4,080-6,590, as follows:<br>
    <br>
    Afghanistan: 100-200?<br>
    Bhutan: 100-200?<br>
    China: 2,000-2,500<br>
    India: 200-600<br>
    Kazakhstan: 180-200<br>
    Kyrgyzstan: 150-500<br>
    Mongolia: 500-1,000<br>
    Nepal: 300-500<br>
    Pakistan: 200-420<br>
    Russia: 150-200<br>
    Tajikistan: 180-220<br>
    Uzbekistan: 20-50
</div>

我得到了：

xpathSApply(h, '//div[@id="population"]', xmlValue)

但我现在被困住了......

Answer 1

如果你意识到文本也是一个节点，它会有所帮助。 div中跟<br/>之后的所有文本都可以通过以下方式检索：

//div[@id="population"]/text()[preceding-sibling::br]

从技术上讲， <br/>代码之间的意味着：

//div[@id="population"]/text()[preceding-sibling::br and following-sibling::br]

...但我想这不是你想要的。

在R中使用br标记后提取文本的XPath

1 个答案: