等待JSON API在没有Docker的情况下加载

时间:2018-05-04 21:20:19

标签: r json web-scraping

我正在尝试使用R和jsonlite从政府网站抓取JSON数据,但它并没有抓住所有内容,我认为这是因为网页没有加载。之所以我认为这是因为它只消耗了1000行,即使我认为接近32,000行。简单的代码:

library(jsonlite)

url <- 'https://data.medicare.gov/resource/rmgi-5fhi.json'

hcahps <- fromJSON(url)

hcahps是一个1000x30的数据框。

我不想运行远程服务器,因为我认为我不允许工作,所以RSelenium可能已经出局了。并且我真的不想处理我正在做的事情。还有其他选择吗?

1 个答案:

答案 0 :(得分:1)

这可以帮助你开始。它是一个Socrata API服务器,因此它需要一些帮助:

library(RSocrata)

xdf <- RSocrata::read.socrata("https://data.medicare.gov/resource/rmgi-5fhi.json")

dim(xdf)
## [1] 263890     28

dplyr::glimpse(xdf)
## Observations: 263,890
## Variables: 28
## $ address                               <chr> "911 NORTHLAND DR", "5360 WEST CREOLE HWY", "6000 S...
## $ city                                  <chr> "PRINCETON", "CAMERON", "LOS ANGELES", "HOUSTON", "...
## $ county_name                           <chr> "SHERBURNE", "CAMERON", "LOS ANGELES", "HARRIS", "H...
## $ hcahps_answer_description             <chr> "Room was \"always\" clean", "\"Always\" quiet at n...
## $ hcahps_answer_percent                 <chr> "83", "Not Available", "Not Applicable", "Not Appli...
## $ hcahps_linear_mean_value              <chr> "Not Applicable", "Not Applicable", "Not Available"...
## $ hcahps_measure_id                     <chr> "H_CLEAN_HSP_A_P", "H_QUIET_HSP_A_P", "H_HSP_RATING...
## $ hcahps_question                       <chr> "Patients who reported that their room and bathroom...
## $ hospital_name                         <chr> "FAIRVIEW NORTHLAND REGIONAL HOSPITAL", "SOUTH CAME...
## $ location.type                         <chr> "Point", "Point", "Point", NA, "Point", "Point", "P...
## $ location.coordinates                  <list> [<-93.58893, 45.55888>, <-93.16524, 29.80717>, <-1...
## $ location_address                      <chr> "911 NORTHLAND DR", "5360 WEST CREOLE HWY", "6000 S...
## $ location_city                         <chr> "PRINCETON", "CAMERON", "LOS ANGELES", "HOUSTON", "...
## $ location_state                        <chr> "MN", "LA", "CA", "TX", "IN", "OH", "WI", "MI", "WA...
## $ location_zip                          <chr> "55371", "70631", "90036", "77004", "46037", "45662...
## $ measure_end_date                      <dttm> 2017-06-30, 2017-06-30, 2017-06-30, 2017-06-30, 20...
## $ measure_start_date                    <dttm> 2016-07-01, 2016-07-01, 2016-07-01, 2016-07-01, 20...
## $ number_of_completed_surveys           <chr> "406", "Not Available", "53", "FEWER THAN 50", "280...
## $ patient_survey_star_rating            <chr> "Not Applicable", "Not Applicable", "Not Applicable...
## $ phone_number                          <chr> "7633896481", "3375424111", "3239301040", "71352868...
## $ provider_id                           <chr> "240141", "190307", "050751", "450797", "150181", "...
## $ state                                 <chr> "MN", "LA", "CA", "TX", "IN", "OH", "WI", "MI", "WA...
## $ survey_response_rate_percent          <chr> "31", "Not Available", "31", "32", "27", "39", "34"...
## $ zip_code                              <chr> "55371", "70631", "90036", "77004", "46037", "45662...
## $ hcahps_answer_percent_footnote        <chr> NA, "1 - The number of cases/patients is too few to...
## $ number_of_completed_surveys_footnote  <chr> NA, "1 - The number of cases/patients is too few to...
## $ survey_response_rate_percent_footnote <chr> NA, "1 - The number of cases/patients is too few to...
## $ patient_survey_star_rating_footnote   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...

它可能感觉它永远消失,因为它有点像。这是一个很大的数据框架,并且需要一些时间来进行d / l并且没有进度条。