Question

我一直在努力弄清楚字符串解析，但似乎已经超出了我的想象。我希望我的最终产品成为this webpage.中“物种名称”列的字符向量到目前为止，我有这样的事情：

url <- 'http://ebird.org/ebird/country/CR?yr=all'
doc <- htmlParse(rawToChar(GET(url)$content))
string <- as(doc, "character")

我发现物种名称出现在这里（在这种情况下，白腹风暴 - 海燕）：

<td headers="s" class="species-name">White-bellied Storm-Petrel</td>

我怎样才能将所有这些内容全部收集到列表中？

Answer 1

我们可以使用rvest

执行此操作

library(rvest)
species <- read_html(url) %>%
              html_nodes('td.species-name') %>%
              html_text
head(species)
#[1] "Common Pauraque"           "Roadside Hawk"             "Inca Dove"
#[4] "Common Ground-Dove"        "White-winged Dove"        
#[6] "Rufous-tailed Hummingbird"

字符串解析来自eBird网站的r

1 个答案: