我正在尝试刮擦表号的整行。以下网址中的8“ https://www.screener.in/company/HCLTECH/consolidated/”
webpage<-"https://www.screener.in/company/HCLTECH/consolidated/"
Webpage<-read_html(webpage)
CF<- Webpage %>%
html_nodes("table") %>%
.[8] %>%
html_table(fill = TRUE)
答案 0 :(得分:0)
我用RSelenium按下那些加号来扩展表格。这是我的尝试:
library(rvest)
library(Rselenium)
# initialize RSelenium
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "chrome")
Sys.sleep(5)
remDr$open()
Sys.sleep(5)
# define and navigate to url
url <-"https://www.screener.in/company/HCLTECH/consolidated/"
remDr$navigate(url)
# click the plus buttons
plus_buttons <- remDr$findElements(using = 'css selector',"#cash-flow button.show-schedules.button-link")
for (plus_button in plus_buttons) {
plus_button$clickElement()
}
# print the table
remDr$getPageSource(header = TRUE)[[1]] %>%
read_html() %>%
html_node("#cash-flow .data-table") %>%
html_table()
但是,正如@hrbrmstr指出的那样,请检查网页的条款。检查您是否尊重他们。在我的解决方案中,我选择打印而不是存储,所以我不会从他们的网站“复制”任何东西。
希望它有所帮助!如果您有任何疑问,请告诉我!