从Java图表和下拉菜单中删除

时间:2019-02-10 12:26:17

标签: r parsing web-scraping

我正在尝试从以下位置抓取数据 https://www.snowyhydro.com.au/our-energy/water/storages/lake-levels-calculator/ Iam尝试使用R的下拉菜单进一步刮擦不同年份的湖泊水位。 在Iam在网上搜索各种代码的过程中Iam艰难地从哪里开始的那一刻,Iam无法获得关于如何使用R获取不同湖泊和Iam的年度价值的起点。

我在这里尝试使用选择器小工具,但由于我认为图表是基于Java的,因此无法正常工作

library('rvest')

url <- 'https://www.snowyhydro.com.au/our-energy/water/storages/lake-levels-calculator/'
webpage <- read_html(url)

我正在寻找他所有湖泊的每日存储水平的表格结果。

1 个答案:

答案 0 :(得分:0)

我能够找到一个更好的网址来请求数据:"https://www.snowyhydro.com.au/wp-content/themes/basic/get_dataxml.php

该请求的JSON响应没有明确地解释为一个表,但是我认为这里的功能应该可以为您完成此操作:

library(httr)
library(jsonlite)

# This function is called from within the other to convert each day 
# to its own dataframe, creating extra columns for the year, month, and day
entry.to.row <- function(entry) {
  date = entry[["-date"]]
  entry.df = data.frame(
    matrix(unlist(entry$lake), nrow=length(entry$lake), byrow = T), 
    stringsAsFactors = F
  )
  colnames(entry.df) = c("LakeName", "Date","Measurement")
  entry.df$Date = date

  date.split = strsplit(date, split = "-")[[1]]
  entry.df$Year = date.split[1]
  entry.df$Month = date.split[2]
  entry.df$Day = date.split[3]
  entry.df
}

# Fetch the data for two years and convert them into two data.frames which 
# we will then merge into a single data.frame
fetch.data <- function(
  base.url = "https://www.snowyhydro.com.au/wp-content/themes/basic/get_dataxml.php",
  current,
  past
) {
  fetched = httr::POST(
    url = base.url, 
    body = list("year_current"=current, "year_pass"=past)
  )

  datJSON = fromJSON(content(fetched, as = "text"), simplifyVector = F)

  pastJSON = datJSON$year_pass$snowyhydro$level
  pastEntries = do.call("rbind", lapply(pastJSON, entry.to.row))

  currentJSON = datJSON$year_current$snowyhydro$level
  currentEntries = do.call("rbind", lapply(currentJSON, entry.to.row))

  rbind(pastEntries, currentEntries)
}

# Fetch the data for 2019 and 2018
dat = fetch.data(current=2019, past=2018)

> head(dat)
              LakeName       Date Measurement Year Month Day
1       Lake Eucumbene 2018-01-01       46.40 2018    01  01
2       Lake Jindabyne 2018-01-01       85.80 2018    01  01
3 Tantangara Reservoir 2018-01-01       42.94 2018    01  01
4       Lake Eucumbene 2018-01-02       46.41 2018    01  02
5       Lake Jindabyne 2018-01-02       85.72 2018    01  02
6 Tantangara Reservoir 2018-01-02       42.98 2018    01  02