我需要从许多这样的网页上获取列表:https://fossilplants.info/genus.htm?page=3 我尝试使用多个R软件包(例如rvest和XML)来做到这一点,但没有弄清楚如何使其工作。有人可以帮我吗?非常感谢。
答案 0 :(得分:2)
我们可以像这样使用rvest
:
library(rvest)
library(purrr)
url <- 'https://fossilplants.info/genus.htm?page=3'
url %>%
read_html() %>%
html_nodes('h1') %>%
html_text() %>%
gsub('[\r\n\t]', '', .)
# [1] "Genus Abies-pollenites Thierg. in Raatz Abh. Preuss. Geol. Landesanst., Neue Folge, (183): 16. 26 Jan 1938"
# [2] "Genus Abieticedripites Maljavk. Trudy Vsesoyuzn. Neft. Nauchno-Issl. Geol.-Razved. Inst., N. S., (119): 103. 11 Jul 1958"
# [3] "Genus Abietineae-pollenites R. Potonié Palaeontographica, Abt. B, Paläophytol., 91(5-6): 144, 145. Apr 1951"
# [4] "Genus Abietineaepollenites R. Potonié in Delcourt, Sprumont Mém. Soc. Belge Géol., N. Sér. 4°, (5): 51. 1955"
# [5] "Genus Abietipites Wodehouse Bull. Torrey Bot. Club, 60(7): 491. Oct 1933"
# [6] "Genus Abietites Maljavk. Trudy Vsesoyuzn. Neft. Nauchno-Issl. Geol.-Razved. Inst., N. S., (231): 142. 10 Aug 1964"
# [7] "Genus Abietites Hising. Lethaea Svecica 110. 7 Dec 1836"
# [8] "Genus Abietopitys Kräusel Beitr. Geol. Erforsch. Deutsch. Schutzgeb., (20): 32. 11 Aug 1928"
# [9] "Unranked Abietosaccites Erdtman Svensk Bot. Tidskr., 41(1): 110. 26 Mar 1947"
#[10] "Genus Abietoxylon 73. "
如果要对多个页面执行此操作,则可以更改url之类并执行相同的功能。
map(1:3, ~{
url <- sprintf('https://fossilplants.info/genus.htm?page=%d', .x)
url %>%
read_html() %>%
html_nodes('h1') %>%
html_text() %>%
gsub('[\r\n\t]', '', .)
}) %>% flatten_chr()