我想抓取网站:link
我使用GET
中的httr
,并获取json lite对象,但没有引号,如下所示:
"hxbase_json1({sum:3003,list:[{Number:'1'...
所以jsonlite::fromJSON
无法读取这个json ..
我的代码是
url <- 'http://stockdata.stock.hexun.com/zrbg/data/zrbList.aspx?'
date <- '2015-12-31'
page <- 1
res <- GET(url, query = list(date = date,
count = 20,
pname = 20,
titType = 'null',
page = page
))
resC <- content(res)
resC1 <- jsonlite::fromJSON(resC)
我想知道是否有任何包自动向json添加引号?或者无论如何都要阅读这样的json?
答案 0 :(得分:4)
将来,请发布您的R代码和正确的网址。它在技术上不是JSON数据,它是一个JavaScript构造(它们不相同)。您可以做一些手术并获得V8包的帮助:
library(httr)
library(V8)
library(stringi)
res <- GET("http://stockdata.stock.hexun.com/zrbg/data/zrbList.aspx?date=2015-12-31&count=20&pname=20&titType=null&page=1&callback=hxbase_json11479871629254")
ctx <- v8()
content(res) %>%
stri_replace_first_fixed("hxbase_json1(", "var dat=") %>%
stri_replace_last_fixed(")", "") %>%
ctx$eval()
ctx$get("dat") %>%
dplyr::glimpse()
## List of 2
## $ sum : int 3003
## $ list:'data.frame': 20 obs. of 13 variables:
## ..$ Number : chr [1:20] "1" "2" "3" "4" ...
## ..$ StockNameLink: chr [1:20] "stock_bg.aspx?code=000002&date=2015-12-31" "stock_bg.aspx?code=601601&date=2015-12-31" "stock_bg.aspx?code=000550&date=2015-12-31" "stock_bg.aspx?code=000001&date=2015-12-31" ...
## ..$ industry : chr [1:20] "万科A(000002)" "中国太保(601601)" "江铃汽车(000550)" "平安银行(000001)" ...
## ..$ stockNumber : chr [1:20] "24.36" "24.07" "23.01" "18.69" ...
## ..$ industryrate : chr [1:20] "90.27" "86.41" "84.29" "84.14" ...
## ..$ Pricelimit : chr [1:20] "A" "A" "A" "A" ...
## ..$ lootingchips : chr [1:20] "15.00" "15.00" "9.03" "15.00" ...
## ..$ Scramble : chr [1:20] "15.00" "12.00" "20.00" "15.00" ...
## ..$ rscramble : chr [1:20] "8.00" "6.00" "18.00" "8.00" ...
## ..$ Strongstock : chr [1:20] "27.91" "29.34" "14.25" "27.45" ...
## ..$ Hstock : chr [1:20] " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-14/1202040307.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-28/1202085787.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-19/1202057166.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-10/1202033377.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ ...
## ..$ Wstock : chr [1:20] "<a href =\"http://stockdata.stock.hexun.com/000002.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/601601.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/000550.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/000001.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" ...
## ..$ Tstock : chr [1:20] "<img alt=\"\" onclick=\"addIStock('000002','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('601601','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('000550','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('000001','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" ...