R httr POST请求登录

时间:2014-04-21 17:35:42

标签: r post login web-scraping httr

我正在尝试使用R httr POST请求登录bondora.com,因为此站点似乎没有使用身份验证:

library(httr)
login <- "https://www.bondora.com/en/login"
pars <- list(
    username = "MyUserName",
    password = "MyPassword"
    )
POST(login, body = pars)

登录后,网站会将用户引导至登录页面bondora.com/en/home,但是如果我解析POST请求,则会获得与登录页面相同的页面标题:< / p>

library(XML)
test <- POST(login, body = pars)
test <- content(test, as = "text")
parsedHtml <- htmlParse(test, asText = TRUE)
xpathSApply(parsedHtml, "//title", xmlValue)
[1] "Join or log in|Loans and investing|Bondora"

我尝试在其他一些网站上使用相同的技术,它似乎工作得很好,除了这个网站。 POST命令的输出如下:

POST(login, body = pars)
   Response [https://www.bondora.com/en/login]
      Status: 200
      Content-type: text/html; charset=utf-8
   <!DOCTYPE HTML>
   <html xmlns="http://www.w3.org/1999/xhtml">

...

我应该使用哪些特定设置登录bondora.com/en/login?

更新1 根据@hadley评论,我厌倦了设置多部分TRUE和FALSE,但没有帮助。然后我通过浏览器检查了请求并添加了相同的标题:

login <- "https://www.bondora.com/en/authenticate"
pars <- list(
  username = "username",
  password = "password"
  )
headers <- list(
  "User-Agent" = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0",
  "Referer" = "https://www.bondora.com/en/login?returnurl=/en/home",
  "Host" = "www.bondora.com",
  "Connection" = "keep-alive",
  "Accept-Language" = "en-US,en;q=0.5",
  "Accept-Encoding" = "gzip, deflate",
  "Accept" = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
  )
POST(login, body = pars, add_headers(.headers = character(headers)))
Error in character(headers) : invalid 'length' argument

似乎我需要像HTML Error 411所指示的那样指定长度参数。我该怎么办?我也尝试将Content-Length = 9844添加到Request Header中,因为它在Response Header中,但也没有成功。

1 个答案:

答案 0 :(得分:2)

我能够通过将httr_0.4升级到httr_0.5

来解决这个问题