网站中具有R的Web抓取表,数据表状态未知

时间:2018-10-25 04:21:45

标签: r web-scraping

我有两个要用R抓取的链接:

  1. https://bettereducation.com.au/school/Primary/vic/vic_top_primary_schools.aspx

  2. https://reiv.com.au/market-insights/all-suburbs

我陷入了获取HTML表数据并转到下一页的困境。这是因为我不知道该表是javascript还是iframe。

我希望R中有某种方法可以模仿单击下一步并不断获取数据的用户。另一个问题是,大多数工具都基于以下事实工作:链接正在更改,并且它们从一个链接跳转到另一个链接,在获取所需信息时,上述两个链接不会更改。

这是我的代码。请随时指出更好的库或方法。

packages_needed <- c("rvest" , "stringr" , "rebus" , "lubridate")

if(length(setdiff(packages_needed, rownames(installed.packages()))) > 0 ) 
{      
  print("These were not found")
  setdiff(packages_needed, rownames(installed.packages())) 

  install.packages(setdiff(packages_needed,rownames(installed.packages())))      
}

for (libs in seq_along(packages_needed)) {
  library(packages_needed[libs], character.only = TRUE)
}

url_base <- ("https://bettereducation.com.au/school/Primary/vic/vic_top_primary_schools.aspx")
session <- html_session(url_base)

read_website <- read_html("https://bettereducation.com.au/school/Primary/vic/vic_top_primary_schools.aspx")

school_html <- html_nodes(read_website, 
                          "#ctl00_ContentPlaceHolder1_GridView1_ctl02_LinkSchool")

school_text <- html_text (school_html)

请帮助刮擦大师!

0 个答案:

没有答案