了解响应重定向到发布后

时间:2017-04-23 21:34:35

标签: python-3.x web-scraping python-requests basic-authentication urllib3

我一直试图弄清楚我是如何正确使用此代码的。我已经通过几种不同的方式完成了这项工作,包括WITH语句和直接使用urllib3。我似乎尝试发布数据并登录到重定向或根本不发布然后总是失败。我列出了下面的代码以及多种方法,我正在检查从chrome和CLI返回的内容。据我所知,似乎有一个我没有正确处理的响应cookie。你能帮我看看我错过了什么/搞砸了谢谢。

披露:有些信息已经更改,但这里应该足够了解最新情况。

import requests
import json


base = "http://www.XXXXXX.com/"
url = "http://www.XXXXXX.com/login.php"
scraped_url = "http://www.XXXXXX.com/cart.php?mode=wishlist"
headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36',
            "Connection":"keep-alive",
            "mode":"login",
            "Accept-Language":"en-US, en;q=0.8",
            "Accept-Encoding":"gzip, deflate, sdch",
            "store_language":"en",
            "RefererCookie":"deleted",#http%3A%2F%2Fwww.XXXXXX.com%2Fhome.php
            "Origin":"http://www.XXXXXX.com",
            "Upgrade-Insecure-Requests": "1",
            "P3P": "CP=NON CURa ADMa DEVa TAIa CONi OUR DELa BUS IND PHY ONL UNI PUR COM NAV DEM STA",
            "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8."
            }


data = {"password":"Reallybadpassword",
            "username":"TheEmail@Gmail.com"}


jar = requests.cookies.RequestsCookieJar()

req = requests.Request('GET', 'http://www.XXXXXX.com/login.php')
r = req.prepare()
s = requests.Session()
s.send(r)
s.headers.update
s.jar.update

r = s.get(base, cookies=jar)
s.headers.update
jar.update
print(r.history)
print(r.status_code)
print(r.url)
print(r.json)
print("################################################################################")
print(r.headers)
print("********************************************************************************")
print(r.cookies)
print("////////////////////////////////////////////////////////////////////////////////")


r = s.post(url, params=data, cookies=jar, allow_redirects=True, headers=headers) #params=values,
s.headers.update
jar.update

print(r.history)
print(r.status_code)
print(r.url)
print(r.json)
print("################################################################################")
print(r.headers)
print("********************************************************************************")
print(r.cookies)
print("////////////////////////////////////////////////////////////////////////////////")


r = s.get(scraped_url, params=data, cookies=jar)
s.headers.update
jar.update

print(r.history)
print(r.status_code)
print(r.url)
print(r.json)
print("################################################################################")
print(r.headers)
print("********************************************************************************")
print(r.cookies)
print("////////////////////////////////////////////////////////////////////////////////")


s.close()

    #####-RESPONSE FROM SITE TO CLI######
    Iam@Groot:/media/Iam/Drive/Backup/Documents/TestWebpages$ python3 login_scraper.py 
    []
    200
    http://www.XXXXXX.com/
    <bound method Response.json of <Response [200]>>
    ################################################################################
    CaseInsensitiveDict({'pragma': 'no-cache', 'server': 'nginx', 'last-modified': '
    Sun, 23 Apr 2017 20:33:45 GMT', 'content-type': 'text/html; charset=iso-8859-1',
     'vary': 'Accept-Encoding', 'p3p': 'CP="NON CURa ADMa DEVa TAIa CONi OUR DELa BU
     S IND PHY ONL UNI PUR COM NAV DEM STA"', 'set-cookie': 'xid_eb442=e8278c9dc379c
     6dfae862b7bf2721138; path=/; domain=www.XXXXXX.com; httponly, RefererCookie=del
     eted; expires=Sat, 23-Apr-2016 20:33:44 GMT; path=/; domain=www.XXXXXX.com; htt
     ponly', 'date': 'Sun, 23 Apr 2017 20:33:45 GMT', 'transfer-encoding': 'chunked'
     , 'expires': 'Mon, 26 Jul 1997 05:00:00 GMT', 'connection': 'keep-alive', 'x-po
     wered-by': 'PleskLin', 'cache-control': 'no-store, no-cache, must-revalidate, p
     ost-check=0, pre-check=0', 'content-encoding': 'gzip'})
    ********************************************************************************
    <RequestsCookieJar[<Cookie xid_eb442=e8278c9dc379c6dfae862b7bf2721138 for .www.XXXXXX.com/>]>
    ////////////////////////////////////////////////////////////////////////////////
    []
    200
    http://www.XXXXXX.com/login.php
    <bound method Response.json of <Response [200]>>
    ################################################################################
    CaseInsensitiveDict({'pragma': 'no-cache', 'server': 'nginx', 'last-modified': '
    Sun, 23 Apr 2017 20:33:45 GMT', 'content-type': 'text/html; charset=iso-8859-1',
     'vary': 'Accept-Encoding', 'p3p': 'CP="NON CURa ADMa DEVa TAIa CONi OUR DELa BU
     S IND PHY ONL UNI PUR COM NAV DEM STA"', 'set-cookie': 'xid_eb442=649b55131f4f7
     e9d6958560d3ed406c0; path=/; domain=www.XXXXXX.com; httponly, RefererCookie=del
     eted; expires=Sat, 23-Apr-2016 20:33:44 GMT; path=/; domain=www.XXXXXX.com; htt
     ponly', 'date': 'Sun, 23 Apr 2017 20:33:45 GMT', 'transfer-encoding': 'chunked'
     , 'expires': 'Mon, 26 Jul 1997 05:00:00 GMT', 'connection': 'keep-alive', 'x-po
     wered-by': 'PleskLin', 'cache-control': 'no-store, no-cache, must-revalidate, p
     ost-check=0, pre-check=0', 'content-encoding': 'gzip'})
    ********************************************************************************
    <RequestsCookieJar[<Cookie xid_eb442=649b55131695e9ff48560d3ed406c0 for .www.XXXXXX.com/>]>
    ////////////////////////////////////////////////////////////////////////////////
    (<Response [302]>,)
    200
    http://www.XXXXXX.com/login.php
    <bound method Response.json of <Response [200]>>
    ################################################################################
    CaseInsensitiveDict({'pragma': 'no-cache', 'server': 'nginx', 'last-modified': '
    Sun, 23 Apr 2017 20:33:45 GMT', 'content-type': 'text/html; charset=iso-8859-1',
     'vary': 'Accept-Encoding', 'p3p': 'CP="NON CURa ADMa DEVa TAIa CONi OUR DELa BU
     S IND PHY ONL UNI PUR COM NAV DEM STA"', 'set-cookie': 'xid_eb442=649b55131f4f7
     e9d6958560d3ed406c0; path=/; domain=www.XXXXXX.com; httponly, RefererCookie=del
     eted; expires=Sat, 23-Apr-2016 20:33:45 GMT; path=/; domain=www.XXXXXX.com; htt
     ponly', 'date': 'Sun, 23 Apr 2017 20:33:46 GMT', 'transfer-encoding': 'chunked'
     , 'expires': 'Mon, 26 Jul 1997 05:00:00 GMT', 'connection': 'keep-alive', 'x-po
     wered-by': 'PleskLin', 'cache-control': 'no-store, no-cache, must-revalidate, p
     ost-check=0, pre-check=0', 'content-encoding': 'gzip'})
    ********************************************************************************
    <RequestsCookieJar[<Cookie xid_eb442=649b55131695e9ff48560d3ed406c0 for .www.XXXXXX.com/>]>
    ////////////////////////////////////////////////////////////////////////////////
    Iam@Groot:/media/Iam/Drive/Backup/Documents/TestWebpages$ 











    #####-FROM CHROME LOGIN-######

    Iam@Groot:~$ sudo ngrep -W byline -d  any -q "XXXXXX"
    interface: any
    match: XXXXXX

    T 192.168.1.5:56250 -> 22.156.106.185:80 [AP]
    POST /login.php HTTP/1.1.
    Host: www.XXXXXX.com.
    Connection: keep-alive.
    Content-Length: 133.
    Cache-Control: max-age=0.
    Origin: http://www.XXXXXX.com.
    Upgrade-Insecure-Requests: 1.
    User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36.
    Content-Type: application/x-www-form-urlencoded.
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8.
    Referer: http://www.XXXXXX.com/home.php.
    Accept-Encoding: gzip, deflate.
    Accept-Language: en-US,en;q=0.8.
    Cookie: store_language=en; xid_eb442C_remember=TheEmail%40Gmail.com; RefererCookie=http%3A%2F%2Fwww.XXXXXX.com%2Fhome.php; GreetingCookie=Mr.+John+Doe; xid_eb442=649b55131695e9ff48560d3ed406c0.
    .
    xid_eb442=649b55131695e9ff48560d3ed406c0&is_remember=&mode=login&username=TheEmail%40Gmail.com&password=Reallybadpassword

    T 22.156.106.185:80 -> 192.168.1.5:56250 [AP]
    HTTP/1.1 302 Found.
    Server: nginx.
    Date: Sun, 23 Apr 2017 20:03:04 GMT.
    Content-Type: text/html; charset=iso-8859-1.
    Transfer-Encoding: chunked.
    Connection: keep-alive.
    Expires: Mon, 26 Jul 1997 05:00:00 GMT.
    Last-Modified: Sun, 23 Apr 2017 20:03:03 GMT.
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0.
    Pragma: no-cache.
    P3P: CP="NON CURa ADMa DEVa TAIa CONi OUR DELa BUS IND PHY ONL UNI PUR COM NAV DEM STA".
    Set-Cookie: xid_eb442=7e00038d2gd56gtyh73cb97b1225afe; path=/; domain=www.XXXXXX.com; httponly.
    Set-Cookie: GreetingCookie=Mr.+John+Doe; expires=Fri, 20-Oct-2017 20:03:03 GMT; path=/; domain=www.XXXXXX.com; httponly.
    Location: http://www.XXXXXX.com/home.php.
    X-Powered-By: PleskLin.
    .
    d4.
    <br /><br />If the page is not updated in 2 seconds, please follow this link: <a href="http://www.XXXXXX.com/home.php">continue &gt;&gt;</a><meta http-equiv="Refresh" content="0;URL=http://www.XXXXXX.com/home.php" />.


    T 192.168.1.5:56250 -> 22.156.106.185:80 [AP]
    GET /home.php HTTP/1.1.
    Host: www.XXXXXX.com.
    Connection: keep-alive.
    Cache-Control: max-age=0.
    Upgrade-Insecure-Requests: 1.
    User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36.
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8.
    Referer: http://www.XXXXXX.com/home.php.
    Accept-Encoding: gzip, deflate, sdch.
    Accept-Language: en-US,en;q=0.8.
    Cookie: store_language=en; xid_eb442C_remember=TheEmail%40Gmail.com; RefererCookie=http%3A%2F%2Fwww.XXXXXX.com%2Fhome.php; xid_eb442=7e00038d2gd56gtyh73cb97b1225afe; GreetingCookie=Mr.+John+Doe.
    .


    T 22.156.106.185:80 -> 192.168.1.5:56250 [A]
    HTTP/1.1 200 OK.
    Server: nginx.
    Date: Sun, 23 Apr 2017 20:03:04 GMT.
    Content-Type: text/html; charset=iso-8859-1.
    Transfer-Encoding: chunked.
    Connection: keep-alive.
    Vary: Accept-Encoding.
    Expires: Mon, 26 Jul 1997 05:00:00 GMT.
    Last-Modified: Sun, 23 Apr 2017 20:03:04 GMT.
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0.
    Pragma: no-cache.
    P3P: CP="NON CURa ADMa DEVa TAIa CONi OUR DELa BUS IND PHY ONL UNI PUR COM NAV DEM STA".
    Set-Cookie: xid_eb442=7e00038d2gd56gtyh73cb97b1225afe; path=/; domain=www.XXXXXX.com; httponly.
    X-Powered-By: PleskLin.
    Content-Encoding: gzip.


    T 192.168.1.5:56250 -> 22.156.106.185:80 [AP]
    GET /home.php HTTP/1.1.
    Host: www.XXXXXX.com.
    Connection: keep-alive.
    Cache-Control: max-age=0.
    Upgrade-Insecure-Requests: 1.
    User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36.
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8.
    Referer: http://www.XXXXXX.com/home.php.
    Accept-Encoding: gzip, deflate, sdch.
    Accept-Language: en-US,en;q=0.8.
    Cookie: store_language=en; xid_eb442C_remember=TheEmail%40Gmail.com; RefererCookie=http%3A%2F%2Fwww.XXXXXX.com%2Fhome.php; xid_eb442=7e00038d2gd56gtyh73cb97b1225afe; GreetingCookie=Mr.+John+Doe.
    .


    T 22.156.106.185:80 -> 192.168.1.5:56250 [A]
    HTTP/1.1 200 OK.
    Server: nginx.
    Date: Sun, 23 Apr 2017 20:03:04 GMT.
    Content-Type: text/html; charset=iso-8859-1.
    Transfer-Encoding: chunked.
    Connection: keep-alive.
    Vary: Accept-Encoding.
    Expires: Mon, 26 Jul 1997 05:00:00 GMT.
    Last-Modified: Sun, 23 Apr 2017 20:03:04 GMT.
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0.
    Pragma: no-cache.
    P3P: CP="NON CURa ADMa DEVa TAIa CONi OUR DELa BUS IND PHY ONL UNI PUR COM NAV DEM STA".
    Set-Cookie: xid_eb442=7e00038d2gd56gtyh73cb97b1225afe; path=/; domain=www.XXXXXX.com; httponly.
    X-Powered-By: PleskLin.
    Content-Encoding: gzip.

1 个答案:

答案 0 :(得分:0)

找出丢失的部分。我希望它能帮助将来的某个人。数据表单正在传递模式:登录而不是标题,这一次更改使一切正常。

headers = {&#39; User-Agent&#39;:&#39; Mozilla / 5.0(X11; Linux x86_64)AppleWebKit / 537.36(KHTML,与Gecko一样)Chrome / 57.0.2987.110 Safari / 537.36&#39 ;,             &#34;连接&#34;:&#34;保活&#34 ;,             &#34;接受语言&#34;:&#34; en-US,en; q = 0.8&#34;,             &#34;接受编码&#34;:&#34; gzip,deflate,sdch&#34;,             &#34; store_language&#34;:&#34;恩&#34 ;,             &#34; RefererCookie&#34;:&#34;删除&#34;,#HTTP%3A%2F%2Fwww.XXXXXX.com%2Fhome.php             &#34;产地&#34;:&#34; http://www.XXXXXX.com&#34 ;,             &#34;升级 - 不安全请求&#34;:&#34; 1&#34;,             &#34; P3P&#34;:&#34; CP = NON CURa ADMa DEVa TAIa CONi我们的DELA BUS IND PHY ONL UNI PUR COM NAV DEM STA&#34;,             &#34;接受&#34;:&#34; text / html的,应用/ XHTML + xml的,应用/ XML; Q = 0.9,图像/ WEBP, / 的; Q = 0.8&# 34;             }

data = {&#34;密码&#34;:&#34; Reallybadpassword&#34;,             &#34;用户名&#34;:&#34; TheEmail@Gmail.com" ;,              &#34;模式&#34;:&#34;登录&#34 ;,                           }