我正在试图刮一张桌子,但是我只能用它来粘贴超级链接的值。我想要粘贴URL而不是表中的值。我已经研究了如何为单个超链接做这个,但是我需要经历并获取每个xpath。有更快的方法吗?
这是我一直在使用的代码:
library(rvest)
url <- read_html("https://coinmarketcap.com/coins/views/all/")
cryptocurrencies <- url %>% html_nodes(xpath = '//*[@id="currencies-all"]')
%>% html_table(fill = T)
cryptocurrencies <- cryptocurrencies[[1]]
我怀疑html_nodes函数中有一个参数可以让我粘贴'href'然而我似乎无法锻炼怎么做。感谢
答案 0 :(得分:1)
首先,您需要使用html_attr()
来获取每个音符的属性,在您的情况下,属性为 href
relative_paths <- page %>%
html_nodes(".currency-name-container") %>%
html_attr("href") #note it is relative path
relative_paths[1:3]
"/currencies/bitcoin/" "/currencies/ethereum/" "/currencies/ripple/"
获得相对路径后,您可以使用jump_to()
或follow_link()
函数在每个页面上进行抓取。
#display first three result
for (path in relative_paths) {
current_session <- html_session("https://coinmarketcap.com/coins/views/all/") %>%
jump_to(path)
#do something here
print(current_session$url)
}
[1] "https://coinmarketcap.com/currencies/bitcoin/"
[1] "https://coinmarketcap.com/currencies/ethereum/"
[1] "https://coinmarketcap.com/currencies/ripple/
或者可以获得绝对路径:
#or get absolute path
absolute_path <- paste0("https://coinmarketcap.com",relative_paths)
absolute_path[1:3]
[1] "https://coinmarketcap.com/currencies/bitcoin/" "https://coinmarketcap.com/currencies/ethereum/" "https://coinmarketcap.com/currencies/ripple/"
最后,您可以将其合并到数据框中。