通过发布javascript在R中进行Web抓取

时间:2017-08-17 11:18:45

标签: r httr

this网站上,我想在顶部搜索框中输入代码“539300”,并从页面中获取结果(仅new url)或某些内容(使用Xpath)。

library(rvest); library(httr); library(RCurl)

url <- "http://www.moneycontrol.com"


res <- POST(url, body = list(search_str = "539300"), encode = "form")

pg <- read_html(content(res, as="text", encoding="UTF-8"))

html_node(pg, xpath = '//*[@id="nChrtPrc"]/div[3]/h1')

这会导致错误

{xml_missing}
<NA>

1 个答案:

答案 0 :(得分:0)

或者只使用RCurl和XML库。

library(RCurl)  
library(XML)

url <- "http://www.moneycontrol.com/india/stockpricequote/miscellaneous/akspintex/AKS01"
curl <- getCurlHandle()
html <- getURL(url,curl=curl, .opts = list(ssl.verifypeer = FALSE),followlocation=TRUE)
doc <- htmlParse(html, encoding = "UTF-8")
h1 <-xpathSApply(doc, "//*[@id='nChrtPrc']/div[3]/h1//text()")
print(h1)