rvest:尝试从网站构建csv / table

时间:2017-11-12 01:26:53

标签: r web-scraping rvest

我正在尝试构建一个类似于:

的表/数据框/ csv文件
( [City1, State1], OverallScore1, QualityOfLife1, Value1 )
( [City2, State2], OverallScore2, QualityOfLife2, Value2 )
...
( [CityN, StateN], OverallScoreN, QualityOfLifeN, ValueN )

对于单个OverallScore,我只能抓取三个值中的任何一个(QualityOfLifeValue(City, State)),代码为:

library(rvest)
live_movie <- read_html("https://realestate.usnews.com/places/rankings/best-places-to-live")
live %>%
  html_node('#main-well') %>%
  html_node('.text-large-for-small-only') %>%
  html_text()

有没有办法一次性抓住所有上述字段?

1 个答案:

答案 0 :(得分:0)

请根据评论中提到的指南使用代码。

library(rvest)
url<- "https://realestate.usnews.com/places/rankings/best-places-to-live"
page<-read_html(url)

overall_score<-html_nodes(page,css=".text-tightest:nth-child(1) .text-coal") %>% html_text()
overall_score<-as.numeric(gsub(" Overall Score","",overall_score))

life_quality<-html_nodes(page, css=".text-tightest:nth-child(2) .text-coal") %>% html_text()
life_quality<-as.numeric(gsub("[\r\n QualityofLife]", "", life_quality))

value<-html_nodes(page, css=".border-left-for-medium-up+ .text-tightest .text-coal") %>% html_text()
value<-as.numeric(gsub("[\r\n Value]", "", value))

heading<-html_nodes(page, css=".heading-large a") %>% html_text()
city<-sapply(heading,function(x){strsplit(x,split=", ")[[1]][1]})
state<-sapply(heading,function(x){strsplit(x,split=", ")[[1]][2]})


real_estate<-data.frame(city,state,overall_score,life_quality,value, row.names = NULL)