Booking.com使用Mechanize登录

时间:2015-04-16 15:05:59

标签: ruby web-scraping mechanize

我尝试使用此网址上的Mechanize登录Booking.com:https://admin.booking.com/hotel/hoteladmin/

到目前为止,我无法通过登录过程。我担心他们在发送表单时使用javascript函数来设置csrf_token。这是我使用的代码:

login_url = "https://admin.booking.com/hotel/hoteladmin"
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.verify_mode= OpenSSL::SSL::VERIFY_NONE

# Get the login page
page = agent.get(login_url)

form = page.form_with(:name => 'myform')
form.loginname = my_username
form.password = my_password
form.add_field!("csrf_token", "empty-token")

# Submit the form
page = form.submit( form.button_with(:name => "Login") )

当我在浏览器上加载页面时,我得到:

var token = '..................EXTRA-LONG-TOKEN..................' || 'empty-token',

但是当我使用Mechanize检查它时,我得到了:

var token = '' || 'empty-token',

请使用Mechanize here找到完整的页面正文。


因此,他们使用javascript在我们提交表单时创建的新字段中设置此变量吗?

if (
    form &&
    form.method &&
    form.method.toLowerCase() === 'post' &&
    typeof form.elements.csrf_token === 'undefined'
) {
    input       =  doc.createElement( 'input' );
    input.name  = 'csrf_token';
    input.type  = 'hidden';
    input.value =  token;

    form.appendChild( input );
}

我还尝试在网络选项卡中查看Firebug但没有成功。当我们提交表格时,有这样的顺序:

302 - POST - login.html
302 - GET  - https://admin.booking.com/hotel/hoteladmin/index-hotel.html?page=&lang=xu&ses=89abb0da735818bc6252d69ece255276&t=1429195712.93074
302 - GET  - https://admin.booking.com/hotel/hoteladmin/extranet_ng/manage/index.html?lang=xu&ses=89abb0da735818bc6252d69ece255276&hotel_id=XXXXXX&t=1429195713.11779
200 - GET  - /home.html

当我检查POST请求时,我可以在“请求数据”中看到:

Content-Type: application/x-www-form-urlencoded
Content-Length: 95
ses=e7541870781128880d7c61aa1e4cc357&loginname=my_login&password=my_password&lang=xu&login=Login+

所以,我不知道是否使用了上面的csrf_token,如果是,我不知道在哪里。我不知道是否阻止我登录的是csrf_token。


以下是我的浏览器中成功登录的请求/响应标头:

---------- Request ----------
Host: admin.booking.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: https://admin.booking.com/hotel/hoteladmin/login.html
Cookie: cwd-extranet=1; ecid=RtSy3w%2Fk5BG5Z67OY8E2rQZz; slan=xu; auth_token=569054884; ut=e; _ga=GA1.2.357900853.1429171802
Connection: keep-alive
---------- Response ----------
Connection: keep-alive
Content-Type: text/html; charset=UTF-8
Date: Thu, 16 Apr 2015 14:57:24 GMT
Location: /hotel/hoteladmin/index-hotel.html?page=&lang=xu&ses=8df70f6f7699cf5c5d63271fbbb47bb1&t=1429196244.67621
Server: nginx
Set-Cookie: cwd-extranet=1; path=/; expires=Tue, 14-Apr-2020 14:57:24 GMT
slan=xu; path=/; expires=Wed, 18-May-2033 03:33:20 GMT; HttpOnly
Strict-Transport-Security: max-age=2592000
Transfer-Encoding: chunked

这是来自Mechanize的标题,登录失败(响应标题上没有位置?):

form encoding: utf-8
query: "ses=e1520f97a6e9056940b4cf4e90684836&loginname=my_login&password=my_password&lang=xu&csrf_token=empty-token"
Net::HTTP::Post: /hotel/hoteladmin/login.html
request-header: accept-encoding => gzip,deflate,identity
request-header: accept => */*
request-header: user-agent => Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/534.51.22 (KHTML, like Gecko) Version/5.1.1 Safari/534.51.22
request-header: accept-charset => ISO-8859-1,utf-8;q=0.7,*;q=0.7
request-header: accept-language => en-us,en;q=0.5
request-header: host => admin.booking.com
request-header: referer => https://admin.booking.com/hotel/hoteladmin/
request-header: content-type => application/x-www-form-urlencoded
request-header: content-length => 105
status: Net::HTTPOK 1.1 200 OK
response-header: server => nginx
response-header: date => Thu, 16 Apr 2015 14:39:22 GMT
response-header: content-type => text/html; charset=UTF-8
response-header: transfer-encoding => chunked
response-header: connection => keep-alive
response-header: vary => Accept-Encoding
response-header: x-powered-by => en105admapp-04
response-header: strict-transport-security => max-age=2592000
response-header: content-encoding => gzip

感谢您的帮助

1 个答案:

答案 0 :(得分:0)

我设法在不处理CSRF令牌的情况下解决了这个问题。

我所做的是遵循Firebug中的POST / GET序列,只有登录表单(隐藏)上的SES令牌才是重要的。

因此,对于登录POST,我们有:

uri = URI.parse("https://admin.booking.com/hotel/hoteladmin/login.html")
data = URI.encode("lang=en&login=Login&ses=#{token}&loginname=#{username}&password=#{password}")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.request_uri)
request.body = data
request['Cookie'] = cookie
response = http.request(request)
cookie = response.response['set-cookie']
location = response.response['location']

然后我们按照之前cookie&的重定向进行操作。 location直到我们收到200响应代码,其中包含:

uri = URI.parse("https://admin.booking.com#{location}")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Get.new(uri.request_uri)
request['Cookie'] = cookie
response = http.request(request)
cookie = response.response['set-cookie']
location = response.response['location']