我正在尝试从Google Scholar网页中抓取信息:
https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:materials_science
library(rvest)
htmlfile<-"https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:materials_science"
g_interest<- read_html(htmlfile) %>% html_nodes("div.gsc_oai_int") %>% html_text()
我得到以下结果:
[1] "Quantum Chemistry Electronic Structure Condensed Matter Physics Materials Science Nanotechnology "
[2] "density functional theory first principles calculations many body theory condensed matter physics materials science "
[3] "chemistry materials science physics nanotechnology "
[4] "Materials Science Nanotechnology Chemistry Physics "
[5] "Physics Theoretical Physics Condensed Matter Theory Materials Science Nanoscience "
[6] "Materials Science Quantum Chemistry Fiber Optic Sensors Geophysics "
[7] "Chemical Physics Condensed Matter Materials Science Magnetic Properties NMR "
[8] "Materials Science "
[9] "Materials Science Physics "
[10] "Physics Materials Science Theoretical Physics Nanoscience "
但是,我希望得到如下结果:
[1]"Quantum Chemistry; Electronic Structure;Condensed Matter Physics; Materials Science; Nanotechnology "
......
任何建议如何用“;”分隔结果?
答案 0 :(得分:0)
您可以使用const getSelectedView= (id) => {
switch (id) {
case "login":
return Login;
case "profile":
return Profile;
case "home":
return Home;
default:
return PageNotFound;
}
}
和purrr
软件包,先提取所有节点,然后将各个节点连接起来。
stringr
结果:
library(rvest)
library(purrr)
library(stringr)
htmlfile<-"https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:materials_science"
content_nodes<- read_html(htmlfile) %>% html_nodes("div.gsc_oai_int")
map_chr(content_nodes,~.x %>%
html_nodes(".gsc_oai_one_int") %>%
html_text() %>%
str_c(collapse = ";"))