我试图使用rvest包从CABI invasive species compendium提取有关入侵植物物种位置的数据。
看了几个教程,我发现我应该能够很容易地从表中抓取数据。但是,我一直遇到困难。
我想说我想要物种Brassica tournefortii的位置数据。我应该能够使用这个代码,它使用outlined here技术来获取物种记录位置的详细信息。
library(rvest)
isc<-read_html("http://www.cabi.org/isc/datasheet/50069")
isc %>%
html_node("#toDistributionTable td:nth-child(1)") %>%
html_text()
但是,运行此代码我收到错误
Error: No matches
我是webscraping的新手。我做错了什么吗?
答案 0 :(得分:8)
首先,我希望我能更多地投资你。最后一个刮刮问题不是$ SPORTSBALL或$ MONEY相关! : - )
该网站是邪恶的。它使用需要处理的嵌入式命名空间,这也意味着使用xml2
包:
library(rvest)
library(xml2)
isc <- read_html("http://www.cabi.org/isc/datasheet/50069")
ns <- xml_ns(isc)
xml_text(xml_find_all(isc, xpath="//div[@id='toDistributionTable']/table/tbody/tr/td[1]", ns))
## [1] "ASIA" "Azerbaijan"
## [3] "Bhutan" "China"
## [5] "-Tibet" "India"
## [7] "-Delhi" "-Indian Punjab"
## [9] "-Rajasthan" "-Uttar Pradesh"
## [11] "Iran" "Iraq"
## [13] "Israel" "Jordan"
## [15] "Kuwait" "Lebanon"
## [17] "Oman" "Pakistan"
## [19] "Qatar" "Saudi Arabia"
## [21] "Syria" "Turkey"
## [23] "Turkmenistan" "United Arab Emirates"
## [25] "Uzbekistan" "Yemen"
## [27] "AFRICA" "Algeria"
## [29] "Egypt" "Libya"
## [31] "Morocco" "South Africa"
## [33] "Tunisia" "NORTH AMERICA"
## [35] "Mexico" "USA"
## [37] "-Arizona" "-California"
## [39] "-Nevada" "-New Mexico"
## [41] "-Texas" "-Utah"
## [43] "SOUTH AMERICA" "Chile"
## [45] "EUROPE" "Belgium"
## [47] "Cyprus" "Denmark"
## [49] "France" "Greece"
## [51] "Ireland" "Italy"
## [53] "Spain" "Sweden"
## [55] "UK" "-England and Wales"
## [57] "-Scotland" "OCEANIA"
## [59] "Australia" "-Australian Northern Territory"
## [61] "-New South Wales" "-Queensland"
## [63] "-South Australia" "-Tasmania"
## [65] "-Victoria" "-Western Australia"
## [67] "New Zealand"