我正在尝试使用R和jsonlite
从政府网站抓取JSON数据,但它并没有抓住所有内容,我认为这是因为网页没有加载。之所以我认为这是因为它只消耗了1000行,即使我认为接近32,000行。简单的代码:
library(jsonlite)
url <- 'https://data.medicare.gov/resource/rmgi-5fhi.json'
hcahps <- fromJSON(url)
hcahps
是一个1000x30的数据框。
我不想运行远程服务器,因为我认为我不允许工作,所以RSelenium
可能已经出局了。并且我真的不想处理我正在做的事情。还有其他选择吗?
答案 0 :(得分:1)
这可以帮助你开始。它是一个Socrata API服务器,因此它需要一些帮助:
library(RSocrata)
xdf <- RSocrata::read.socrata("https://data.medicare.gov/resource/rmgi-5fhi.json")
dim(xdf)
## [1] 263890 28
dplyr::glimpse(xdf)
## Observations: 263,890
## Variables: 28
## $ address <chr> "911 NORTHLAND DR", "5360 WEST CREOLE HWY", "6000 S...
## $ city <chr> "PRINCETON", "CAMERON", "LOS ANGELES", "HOUSTON", "...
## $ county_name <chr> "SHERBURNE", "CAMERON", "LOS ANGELES", "HARRIS", "H...
## $ hcahps_answer_description <chr> "Room was \"always\" clean", "\"Always\" quiet at n...
## $ hcahps_answer_percent <chr> "83", "Not Available", "Not Applicable", "Not Appli...
## $ hcahps_linear_mean_value <chr> "Not Applicable", "Not Applicable", "Not Available"...
## $ hcahps_measure_id <chr> "H_CLEAN_HSP_A_P", "H_QUIET_HSP_A_P", "H_HSP_RATING...
## $ hcahps_question <chr> "Patients who reported that their room and bathroom...
## $ hospital_name <chr> "FAIRVIEW NORTHLAND REGIONAL HOSPITAL", "SOUTH CAME...
## $ location.type <chr> "Point", "Point", "Point", NA, "Point", "Point", "P...
## $ location.coordinates <list> [<-93.58893, 45.55888>, <-93.16524, 29.80717>, <-1...
## $ location_address <chr> "911 NORTHLAND DR", "5360 WEST CREOLE HWY", "6000 S...
## $ location_city <chr> "PRINCETON", "CAMERON", "LOS ANGELES", "HOUSTON", "...
## $ location_state <chr> "MN", "LA", "CA", "TX", "IN", "OH", "WI", "MI", "WA...
## $ location_zip <chr> "55371", "70631", "90036", "77004", "46037", "45662...
## $ measure_end_date <dttm> 2017-06-30, 2017-06-30, 2017-06-30, 2017-06-30, 20...
## $ measure_start_date <dttm> 2016-07-01, 2016-07-01, 2016-07-01, 2016-07-01, 20...
## $ number_of_completed_surveys <chr> "406", "Not Available", "53", "FEWER THAN 50", "280...
## $ patient_survey_star_rating <chr> "Not Applicable", "Not Applicable", "Not Applicable...
## $ phone_number <chr> "7633896481", "3375424111", "3239301040", "71352868...
## $ provider_id <chr> "240141", "190307", "050751", "450797", "150181", "...
## $ state <chr> "MN", "LA", "CA", "TX", "IN", "OH", "WI", "MI", "WA...
## $ survey_response_rate_percent <chr> "31", "Not Available", "31", "32", "27", "39", "34"...
## $ zip_code <chr> "55371", "70631", "90036", "77004", "46037", "45662...
## $ hcahps_answer_percent_footnote <chr> NA, "1 - The number of cases/patients is too few to...
## $ number_of_completed_surveys_footnote <chr> NA, "1 - The number of cases/patients is too few to...
## $ survey_response_rate_percent_footnote <chr> NA, "1 - The number of cases/patients is too few to...
## $ patient_survey_star_rating_footnote <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
它可能感觉它永远消失,因为它有点像。这是一个很大的数据框架,并且需要一些时间来进行d / l并且没有进度条。