如何从XMLDocumentContent对象中提取$ children $ html内容

时间:2013-09-30 14:41:41

标签: html xml r

提前道歉,我确信这很简单,但我无法弄清楚我做错了什么..

除了其他东西,这段代码..

study.name <- 'NLSY79'
library(XML)
library(httr)
sub.study <- paste0( "https://www.nlsinfo.org/investigator/servlet1?get=SUBSTUDIES&study=" , study.name )
study.html <- GET( sub.study )
content( study.html )
study.block <- htmlParse( study.html , asText = TRUE )

..给了我..

$children$html
<html>
 <body>
  <p>
   false
   <select id="thesubstudies" onchange="onSubstudyChanged(this);">
    <option value="-1" selected="selected">(Choose One)</option>
    <option value="343.06">NLSY79 (1979-2010)</option>
   </select>
  </p>
 </body>
</html>

我只想快速(自动)方式提取“343.06”

谢谢!

2 个答案:

答案 0 :(得分:3)

您可以使用xpathSApply提取所需的元素

xpathSApply(study.block, "//option")
# [[1]]
# <option value="-1" selected="selected">(Choose One)</option> 
# [[2]]
# <option value="343.06">NLSY79 (1979-2010)</option> 

并对其应用函数(xmlValuexmlAttrs,具体取决于具体情况。)

xpathSApply(study.block, "//option", function(u) xmlAttrs(u)["value"])
#   value    value 
#    "-1" "343.06" 

答案 1 :(得分:1)

您也可以使用xmlGetAtrr

xpathSApply(study.block, "//option", xmlGetAttr, "value")
[1] "-1"     "343.06"

xpathSApply(study.block, "//option[not(@selected)]", xmlGetAttr, "value")
[1] "343.06"