从消费者事务网站搜索数据

时间:2017-05-04 03:49:37

标签: r text-mining rvest

我尝试使用Rvest从http://consumeraffairs.com下载评论。我能够下载文本,但由于它是图像,我无法获得评级。有没有办法获得数量的评级?我使用了selectorgadget来获取CSS。

comcast <- 
 read_html("https://www.consumeraffairs.com/cable_tv/comcast_cable.html")

rating <- comcast%>%
  html_nodes(".star-rc span") %>%
  html_text()
rating

1 个答案:

答案 0 :(得分:1)

如果您检查该网址的来源,您会看到评分的存储方式如下:

<meta itemprop="ratingValue" content="1">

所以有一种方法可以得到一个带有评级的矢量(至少是第一页):

comcast %>% 
  html_nodes("meta[itemprop=ratingValue]") %>% 
  html_attr("content")

 [1] "3" "1" "1" "5" "1" "1" "2" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "2" "1" "1" "2"
[25] "1" "1" "1" "3" "1" "1"