Question

我有一个内部公司html网页，其中带有div html标签，格式如下：

<div id="B4_6_2019">
<div id="B3_6_2019">

我想提取所有id名称，因此最终结果将是 B4_6_2019 B3_6_2019

我该怎么做？（ID名称均为日期）

Answer 1

还尝试使用attribute =值css选择器，并在运算符的末尾将子字符串与ID值字符串的末尾匹配

library(rvest)
page <- read_html("url")
id<- page %>% 
  html_nodes("[id$='_2019']") %>%
  html_attr(., "id")

Answer 2

尝试做

library(dplyr)
library(rvest)

url %>%
  read_html() %>%
  html_nodes("div") %>%
  html_attr("id") %>%
  grep("^B\\d+_\\d+_\\d+", ., value = TRUE)