我有两个要用R抓取的链接:
https://bettereducation.com.au/school/Primary/vic/vic_top_primary_schools.aspx
https://reiv.com.au/market-insights/all-suburbs
我陷入了获取HTML表数据并转到下一页的困境。这是因为我不知道该表是javascript还是iframe。
我希望R中有某种方法可以模仿单击下一步并不断获取数据的用户。另一个问题是,大多数工具都基于以下事实工作:链接正在更改,并且它们从一个链接跳转到另一个链接,在获取所需信息时,上述两个链接不会更改。
这是我的代码。请随时指出更好的库或方法。
packages_needed <- c("rvest" , "stringr" , "rebus" , "lubridate")
if(length(setdiff(packages_needed, rownames(installed.packages()))) > 0 )
{
print("These were not found")
setdiff(packages_needed, rownames(installed.packages()))
install.packages(setdiff(packages_needed,rownames(installed.packages())))
}
for (libs in seq_along(packages_needed)) {
library(packages_needed[libs], character.only = TRUE)
}
url_base <- ("https://bettereducation.com.au/school/Primary/vic/vic_top_primary_schools.aspx")
session <- html_session(url_base)
read_website <- read_html("https://bettereducation.com.au/school/Primary/vic/vic_top_primary_schools.aspx")
school_html <- html_nodes(read_website,
"#ctl00_ContentPlaceHolder1_GridView1_ctl02_LinkSchool")
school_text <- html_text (school_html)
请帮助刮擦大师!