在R中为json添加引号

时间:2016-11-23 02:41:03

标签: json r regex web-scraping jsonlite

我想抓取网站:link

我使用GET中的httr,并获取json lite对象,但没有引号,如下所示:

"hxbase_json1({sum:3003,list:[{Number:'1'...

所以jsonlite::fromJSON无法读取这个json ..

我的代码是

url <- 'http://stockdata.stock.hexun.com/zrbg/data/zrbList.aspx?'
date <- '2015-12-31'
page <- 1

res <- GET(url, query = list(date = date,
                             count = 20,
                             pname = 20,
                             titType = 'null',
                             page = page
                             ))

resC <- content(res)
resC1 <- jsonlite::fromJSON(resC) 

我想知道是否有任何包自动向json添加引号?或者无论如何都要阅读这样的json?

1 个答案:

答案 0 :(得分:4)

将来,请发布您的R代码和正确的网址。它在技术上不是JSON数据,它是一个JavaScript构造(它们不相同)。您可以做一些手术并获得V8包的帮助:

library(httr)
library(V8)
library(stringi)

res <- GET("http://stockdata.stock.hexun.com/zrbg/data/zrbList.aspx?date=2015-12-31&count=20&pname=20&titType=null&page=1&callback=hxbase_json11479871629254")

ctx <- v8()

content(res) %>% 
  stri_replace_first_fixed("hxbase_json1(", "var dat=") %>% 
  stri_replace_last_fixed(")", "") %>% 
  ctx$eval()

ctx$get("dat") %>% 
  dplyr::glimpse()
## List of 2
##  $ sum : int 3003
##  $ list:'data.frame': 20 obs. of  13 variables:
##   ..$ Number       : chr [1:20] "1" "2" "3" "4" ...
##   ..$ StockNameLink: chr [1:20] "stock_bg.aspx?code=000002&date=2015-12-31" "stock_bg.aspx?code=601601&date=2015-12-31" "stock_bg.aspx?code=000550&date=2015-12-31" "stock_bg.aspx?code=000001&date=2015-12-31" ...
##   ..$ industry     : chr [1:20] "万科A(000002)" "中国太保(601601)" "江铃汽车(000550)" "平安银行(000001)" ...
##   ..$ stockNumber  : chr [1:20] "24.36" "24.07" "23.01" "18.69" ...
##   ..$ industryrate : chr [1:20] "90.27" "86.41" "84.29" "84.14" ...
##   ..$ Pricelimit   : chr [1:20] "A" "A" "A" "A" ...
##   ..$ lootingchips : chr [1:20] "15.00" "15.00" "9.03" "15.00" ...
##   ..$ Scramble     : chr [1:20] "15.00" "12.00" "20.00" "15.00" ...
##   ..$ rscramble    : chr [1:20] "8.00" "6.00" "18.00" "8.00" ...
##   ..$ Strongstock  : chr [1:20] "27.91" "29.34" "14.25" "27.45" ...
##   ..$ Hstock       : chr [1:20] " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-14/1202040307.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-28/1202085787.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-19/1202057166.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-10/1202033377.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ ...
##   ..$ Wstock       : chr [1:20] "<a href =\"http://stockdata.stock.hexun.com/000002.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/601601.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/000550.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/000001.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" ...
##   ..$ Tstock       : chr [1:20] "<img alt=\"\" onclick=\"addIStock('000002','1');\"  code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('601601','1');\"  code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('000550','1');\"  code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('000001','1');\"  code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" ...