是否可以使用rvest刮擦类本身?

时间:2019-02-01 18:38:08

标签: r web-scraping rvest

我想从网站的HTML代码中抓取类本身。

HTML代码是

<div class="table width-100 pad-left-none pad-right-none margin-bottom-md">
        <div class="tr">
            <div class="bold font-12 uppercase lt-grey letter-spacing-1 td">Customer Service</div>
            <div class="rating-static-indv rating-50 margin-top-none td"></div>
        </div>

        <!-- REVIEW RATING - QUALITY OF WORK -->
        <div class="tr margin-bottom-md">
            <div class="bold font-12 uppercase lt-grey letter-spacing-1 td">Quality of Work</div>
            <div class="rating-static-indv rating-50 margin-top-none td"></div>
        </div>

        <!-- REVIEW RATING - FRIENDLINESS -->
        <div class="tr margin-bottom-md">
            <div class="bold font-12 uppercase lt-grey letter-spacing-1 td">Friendliness</div>
            <div class="rating-static-indv rating-50 margin-top-none td"></div>
        </div>

        <!-- REVIEW RATING - PRICING -->
        <div class="tr margin-bottom-md">
            <div class="bold font-12 uppercase lt-grey letter-spacing-1 td">Pricing</div>
            <div class="rating-static-indv rating-30 margin-top-none td"></div>
        </div>

        <!-- REVIEW RATING - EXPERIENCE -->
        <div class="tr margin-bottom-md">
            <div class="td bold font-12 uppercase lt-grey letter-spacing-1">Overall Experience</div>
            <div class="rating-static-indv rating-50 margin-top-none td"></div>
        </div>

由此,我只想抓取所有具有“ rating-static-indv rating -...”的类。

我尝试过

x  <- NULL
k1<-"https://www.dealerrater.com/dealer/Fox-Volkswagen-of-Rochester-Hills-review-5380/?filter=ONLY_POSITIVE#link"
url<-paste(k1) 
review <- read_html(url)
states<- cbind(review %>% html_nodes("div.table.width-100.pad-left-none pad-right-none.margin-bottom-md")%>% html_attr("class") )
x<- rbind(x, states)

但是,这仅返回“ table width-100 pad-left-none pad-right-none margin-bottom-md”类。我想要我的必需输出,如下所示:

rating-static-indv rating-50 margin-top-none td
rating-static-indv rating-50 margin-top-none td
rating-static-indv rating-50 margin-top-none td
rating-static-indv rating-30 margin-top-none td
rating-static-indv rating-50 margin-top-none td

1 个答案:

答案 0 :(得分:0)

您只是选择“表” <div>。您实际上需要选择具有所需属性的<div>。试试

review %>% html_nodes("div.table div.rating-static-indv") %>% 
  html_attr('class')