我正在尝试抓取this网站,我已经使用rvest
编写了登录代码,但是每次页面刷新时,表单名称都会更改。
library(rvest)
loginpage <- "https://demo.glpi-project.org/"
pagesession <- html_session(loginpage)
pageform <- html_form(pagesession)[[1]]
formfill <- set_values(pageform, fielda5bd99dcd2eaa8 = "****",
fieldb5bd99dcd2eaad = "****")
successlogin <- submit_form(pagesession,formfill)
fielda5bd99dcd2eaa8
和fieldb5bd99dcd2eaad
是输入字段的名称,每次刷新时都会更改。
现在,每次运行脚本时,我都会更改名称字段
答案 0 :(得分:1)
希望这足以提示您正确的方向:
library(rvest)
library(httr)
library(dplyr)
httr::GET(
"https://demo.glpi-project.org/"
) -> res
pg <- httr::content(res)
form <- html_nodes(pg, "form")
inputs <- html_nodes(form, "input")
data_frame(
id = html_attr(inputs, "id"),
name = html_attr(inputs, "name"),
value = html_attr(inputs, "value")
)
## # A tibble: 6 x 3
## id name value
## <chr> <chr> <chr>
## 1 login_name fielda5bd9bf41b7af9 NA
## 2 login_password fieldb5bd9bf41b7afe NA
## 3 NA auth local
## 4 login_remember fieldc5bd9bf41b7aff NA
## 5 NA submit Post
## 6 NA _glpi_csrf_token ea1aff0b53753e14a76077bd77fb21c2