如何在以下行中的br
标记之后提取文字:
<div id='population'>
The Snow Leopard Survival Strategy (McCarthy <em>et al.</em> 2003, Table
II) compiled national snow leopard population estimates, updating the work
of Fox (1994). Many of the estimates are acknowledged to be rough and out
of date, but the total estimated population is 4,080-6,590, as follows:<br>
<br>
Afghanistan: 100-200?<br>
Bhutan: 100-200?<br>
China: 2,000-2,500<br>
India: 200-600<br>
Kazakhstan: 180-200<br>
Kyrgyzstan: 150-500<br>
Mongolia: 500-1,000<br>
Nepal: 300-500<br>
Pakistan: 200-420<br>
Russia: 150-200<br>
Tajikistan: 180-220<br>
Uzbekistan: 20-50
</div>
我得到了:
xpathSApply(h, '//div[@id="population"]', xmlValue)
但我现在被困住了......
答案 0 :(得分:25)
如果你意识到文本也是一个节点,它会有所帮助。 div中跟<br/>
之后的所有文本都可以通过以下方式检索:
//div[@id="population"]/text()[preceding-sibling::br]
从技术上讲, <br/>
代码之间的意味着:
//div[@id="population"]/text()[preceding-sibling::br and following-sibling::br]
...但我想这不是你想要的。