如何使用rvest在此HTML元素中提取名称

时间:2017-10-10 20:41:54

标签: html r

我已经搜索了很多可靠的搜索帖,但找不到像我这样的例子。我跟随了listgadget的R插图示例(https://blog.rstudio.com/2014/11/24/rvest-easy-web-scraping-with-r/),但是根据需要输入了我的用例。选择器小工具的建议都没有让我得到我需要的东西。我需要在页面上提取每个评论的名称。该名称在幕后的样本如下:

<span itemprop="name" class="sg_selected">This Name</span>

这是我的代码。理想情况下,此代码应该为我提供此网页上的个人名称。

    library(rvest)
    library(dplyr)

    dsa_reviews <- 
    read_html("https://www.directsalesaid.com/companies/traveling-
    vineyard#reviews")

    review_names <- html_nodes(dsa_reviews,'#reviews span')

    df <- bind_rows(lapply(xml_attrs(review_names), function(x) 
    data.frame(as.list(x), stringsAsFactors=FALSE)))

如果这是重复的问题或者格式不正确,请道歉。请随时请求任何必要的修改。

1 个答案:

答案 0 :(得分:3)

这是:

library(rvest)
library(dplyr)

dsa_reviews <- 
  read_html("https://www.directsalesaid.com/companies/traveling-vineyard#reviews")

html_nodes(dsa_reviews,'[itemprop=name]') %>% 
  html_text() 

 [1] "Traveling Vineyard"     ""                      
 [3] "Kiersten Ray-kuhn"      "Miley Sama"            
 [5] " Nancy Shawtone "       "Amanda Moore"          
 [7] "Matt"                   "Kathy Barzal"          
 [9] "Lesa Brinker"           "Lori Stryker"          
[11] "Jeanette Holtman"       "Penny Notarnicola"     
[13] "Laura Ann"              "Nicole Lafave"         
[15] "Gretchen Hess Miller"   "Gina Devine"           
[17] "Ashley Lawton Converse" "Morgan Williams"       
[19] "Angela Baston Mckeone"  "Traci Feshler"         
[21] "Kisha Marshall Dlugos"  "Jody Cole Dvorak" 

科林