Question

我有一个遵循以下格式的HTML文件：

<div id='1' class='location element' style='width:100px; top:5068px; left: 3332px;'><div class='position'></div><div class='time'></div><div class='age'></div>Name</div>

我想从第一个div（在本例中为“location＆＃39;”）和名称中检索字符串。

到目前为止，我可以使用ID号检索名称。

html_file%>% 
  html_nodes("#1") %>%
  html_text()

如何检索第一个字段＆＃39; class＆＃39;？感谢。

Answer 1

使用html_attr：

library(rvest)
library(dplyr)
html_file%>% 
    html_nodes("#1") %>%
    html_attr("class")

[1] "location element"

注意：如果你使用html_attrs()，你可以获得所有属性，也可以从那里出发：

library(rvest)
library(dplyr)
html_file%>% 
    html_nodes("#1") %>%
    html_attrs()

[[1]]
                                      id                                    class 
                                     "1"                       "location element" 
                                   style 
"width:100px; top:5068px; left: 3332px;"

R：从HTML中抓取一些信息

1 个答案: