Question

在htmlParse的{{1}}中使用xpathSApply和XML package，我遇到了无法从某个HTML元素下载R的问题在网页上。我使用xmlValue进行网页抓取是相当的（如果不是完全的话），所以我不确定我需要做什么才能获得我需要的信息。

基本上我所定位的页面中的代码部分是：

所以在建立到网页的链接后（在<div class="panel-body"> <div id="primarycitation"> <h4>Tetracycline Repressor Allostery Does not Depend on Divalent Metal Recognition. </h4> - 循环中;因此for）**：

我用过：

pdbId <- strtrim(pp2[i, 1], 4)
url2 <- paste("http://www.rcsb.org/pdb/explore/explore.do?structureId=", pdbId, sep = "")

val <- htmlParse(url2)
body <- xmlChildren(xmlRoot(val))$body

但我得到的只是垃圾：

script2 <- xpathSApply(body,
                       "//div[@id = 'primarycitation']",
                       xmlValue)

同样，我对网络抓取并不是很熟悉，但据我所知，并且根据我迄今为止所有其他功能的经验，引文标题应该是{{{ 1}}。有什么建议吗？

**要在最后处理它> script2 [1] "\n \n "我在这里使用的是xpathSApply。

在R中的XML包中缺少xpathSApply的信息

0 个答案: