我正在尝试使用R从受密码保护的网站(我有一个有效的用户名/密码)中删除一些表格数据,但尚未成功。
举个例子,这是登录我的牙医的网站:http://www.deltadentalins.com/uc/index.html
我尝试了以下内容:
library(httr)
download <- "https://www.deltadentalins.com/indService/faces/Home.jspx?_afrLoop=73359272573000&_afrWindowMode=0&_adf.ctrl-state=12pikd0f19_4"
terms <- "http://www.deltadentalins.com/uc/index.html"
values <- list(username = "username", password = "password", TARGET = "", SMAUTHREASON = "", POSTPRESERVATIONDATA = "",
bundle = "all", dups = "yes")
POST(terms, body = values)
GET(download, query = values)
我也尝试过:
your.username <- 'username'
your.password <- 'password'
require(SAScii)
require(RCurl)
require(XML)
agent="Firefox/23.0"
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
curl = getCurlHandle()
curlSetOpt(
cookiejar = 'cookies.txt' ,
useragent = agent,
followlocation = TRUE ,
autoreferer = TRUE ,
curl = curl
)
# list parameters to pass to the website (pulled from the source html)
params <-
list(
'lt' = "",
'_eventID' = "",
'TARGET' = "",
'SMAUTHREASON' = "",
'POSTPRESERVATIONDATA' = "",
'SMAGENTNAME' = agent,
'username' = your.username,
'password' = your.password
)
#logs into the form
html = postForm('https://www.deltadentalins.com/siteminderagent/forms/login.fcc', .params = params, curl = curl)
# logs into the form
html
我无法上班。有没有可以提供帮助的专家呢?
答案 0 :(得分:1)
2016年3月5日更新以使用包Relenium
#### FRONT MATTER ####
library(devtools)
library(RSelenium)
library(XML)
library(plyr)
######################
## This block will open the Firefox browser, which is linked to R
RSelenium::checkForServer()
remDr <- remoteDriver()
startServer()
remDr$open()
url="yoururl"
remDr$navigate(url)
第一部分加载所需的包,设置登录URL,然后在Firefox实例中打开它。我输入我的用户名&amp;密码,然后我就可以开始抓了。
infoTable <- readHTMLTable(firefox$getPageSource(), header = TRUE)
infoTable
Table1 <- infoTable[[1]]
Apps <- Table1[,1] # Application Numbers
对于此示例,第一页包含两个表。第一个是我感兴趣的,有一个申请号和名称表。我拿出第一栏(申请号)。
Links2 <- paste("https://yourURL?ApplicantID=", Apps2, sep="")
我想要的数据存储在invidiual应用程序中,所以这一位创建了我想要循环的链接。
### Grabs contact info table from each page
LL <- lapply(1:length(Links2),
function(i) {
url=sprintf(Links2[i])
firefox$get(url)
firefox$getPageSource()
infoTable <- readHTMLTable(firefox$getPageSource(), header = TRUE)
if("First Name" %in% colnames(infoTable[[2]]) == TRUE) infoTable2 <- cbind(infoTable[[1]][1,], infoTable[[2]][1,])
else infoTable2 <- cbind(infoTable[[1]][1,], infoTable[[3]][1,])
print(infoTable2)
}
)
results <- do.call(rbind.fill, LL)
results
write.csv(results, "C:/pathway/results2.csv")
这个最后一节循环遍历每个应用程序的链接,然后用它们的联系信息(表2或表3,因此R必须先检查)抓取表。再次感谢Chinmay Patil关于relenium的提示!