分页循环卡在x> 99

时间:2019-04-27 11:16:44

标签: r web-scraping pagination

我对R很有经验,但是报废对我来说是新的。 我正在从API抓取一些数据,只要提取页面0到98,我的代码就可以正常工作。只要循环达到99,我都会收到错误Error: Internal Server Error (HTTP 500).

试图找到答案,但我只精通R和C#,无法理解python或其他语言。


keywords = c('ABC OR DEF')

parameters <- list(
  'q'       = keywords,
  num_days = 1,
  language = 'en',
  num_results = 100,
  page = 0,
  'api_key' = '123456'
)

response <- httr::GET(get_url, query = parameters)

# latest_page_number <- get_last_page(parsed)

httr::stop_for_status(response)

content <- httr::content(response, type = 'text', encoding = 'utf-8')

parsed  <- jsonlite::fromJSON(content, simplifyVector = FALSE, simplifyDataFrame = TRUE)

num_pages = round(parsed[["total_results"]]/100)
print(num_pages)

result = parsed$results
for(x in 1:(num_pages))
{
  print(x)
  parameters <- list(
    'q' = keywords,
    page = x,
    num_days = 7,
    language = 'en',
    num_results = 100,
    'api_key' = '123456'
  )
  response <- httr::GET(get_url, query = parameters)

  httr::stop_for_status(response)

  content <- httr::content(response, type = 'text', encoding = 'utf-8')
  # content <- httr::content(response)

  parsed  <- jsonlite::fromJSON(content, simplifyVector = FALSE, simplifyDataFrame = TRUE)

  Sys.sleep(0.2)

  result = rbind(result,parsed$results[,colnames(result)])

}

0 个答案:

没有答案