一段时间以来,我一直在尝试登录该网站并下载一些文件。我找不到下面的代码有什么问题。我是Python的新手:
当您点击登录页面https://www.targetsite/members/login.php时,它将进行以下调用:
登录页面:
[General]
Request URL: https://www.targetsite/members/login.php
Request Method: GET
Status Code: 302 Found
Remote Address: 0.0.0.0:443
Referrer Policy: no-referrer-when-downgrade
[Response Headers]
Connection: keep-alive
Content-Length: 425
Content-Type: text/html; charset=iso-8859-1
Date: Thu, 13 Dec 2018 14:37:02 GMT
Keep-Alive: timeout=5
Location: https://www.targetsite/auth.form?bWFmYkpTbjhNN0J4bWM2S2NwaCtlNTkydDJJV0xiMk1aTWRKa0kwVldWZ29hZjdaMEUweFJuRWl3a3NqOVUwTwpSbmtyRnptekp6Wm40VlM2MDF5dWVVSDR2V1FXMG5JU1ZNQVUrZ0lvYTlXTWQ4T2ZEVzJVY1ZSWW4wZk1NVHZhCm9uZWZZeTI2V1JNPQo=
Server: nginx/1.14.2
Set-Cookie: pcar%5fUkVTVFJJQ1RFRA%3d%3d=; path=/; domain=.targetsite; expires=Wed 13-Dec-2017 14:37:02 GMT
X-Vegas-No-Cache: YES
[Request Headers]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Cookie: pcah=SXlOK1VkSytiMTl0VllvbDk4N2tVaXR5bmZFZmNNVUsK
Host: www.targetsite
Referer: http://www.targetsite/tour/index.php
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36
AuthForm:
[General]
Request URL: https://www.targetsite/auth.form?bWFmYkpTbjhNN0J4bWM2S2NwaCtlNTkydDJJV0xiMk1aTWRKa0kwVldWZ29hZjdaMEUweFJuRWl3a3NqOVUwTwpSbmtyRnptekp6Wm40VlM2MDF5dWVVSDR2V1FXMG5JU1ZNQVUrZ0lvYTlXTWQ4T2ZEVzJVY1ZSWW4wZk1NVHZhCm9uZWZZeTI2V1JNPQo=
Request Method: GET
Status Code: 200 OK
Remote Address: 0.0.0.0:443
Referrer Policy: no-referrer-when-downgrade
[Response Headers]
Cache-Control: max-age=1, must-revalidate
Connection: keep-alive
Content-Type: text/html
Date: Thu, 13 Dec 2018 14:37:03 GMT
Expires: Thu, 13 Dec 2018 14:37:03 GMT
Keep-Alive: timeout=5
Server: nginx/1.14.2
Set-Cookie: pcar%5fUkVTVFJJQ1RFRA%3d%3d=; path=/; domain=.targetsite; expires=Wed 13-Dec-2017 14:37:03 GMT
Transfer-Encoding: chunked
[Request Headers]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Cookie: pcah=SXlOK1VkSytiMTl0VllvbDk4N2tVaXR5bmZFZmNNVUsK
Host: www.targetsite
Referer: http://www.targetsite/tour/index.php
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36
bWFmYkpTbjhNN0J4bWM2S2NwaCtlNTkydDJJV0xiMk1aTWRKa0kwVldWZ29hZjdaMEUweFJuRWl3a3NqOVUwTwpSbmtyRnptekp6Wm40VlM2MDF5dWVVSDR2V1FXMG5JU1ZNQVUrZ0lvYTlXTWQ4T2ZEVzJVY1ZSWW4wZk1NVHZhCm9uZWZZeTI2V1JNPQo:
因此,基于使用chrome inspector观察到的行为,我编写了以下代码,试图模拟访问https://www.targetsite.com/members/login.php
后触发的操作#Libraries
import requests
import json
from lxml import html
#URL
primeiraUrl = 'https://www.targetsite.com/members/login.php'
urlPost = 'https://ams.targetsite.com/auth.form'
#Credentials
userd = 'user'
passwd = 'pass'
session = requests.Session()
#session.verify = False
#GetToken
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Referer': 'http://www.targetsite.com/tour/index.php',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
get1stContact = session.get(primeiraUrl,headers=headers)
segundaUrl = get1stContact.url
get2ndContact = session.get(segundaUrl,headers=headers)
然后,当您登录网站时,您将获得:
[General]
Request URL: https://ams.targetsite.com/auth.form
Request Method: POST
Status Code: 302 Found
Remote Address: 1.1.1.1:443
Referrer Policy: no-referrer-when-downgrade
[Response Headers]
Connection: Keep-Alive
Content-Length: 239
Content-Type: text/html; charset=iso-8859-1
Date: Thu, 13 Dec 2018 14:47:57 GMT
Keep-Alive: timeout=20, max=94
Location: http://www.targetsite.com/members/index.php
Server: Apache/2.2.15 (CentOS)
Set-Cookie: pcar%5fUkVTVFJJQ1RFRA%3d%3d=cS90NW9XLzVVeVNIeElMOUpFaHlCb2hGWkZveVUrdTFiK0dad0FYVDN2UT0K; path=/; domain=.targetsite.com; expires=Thu 13-Dec-2018 20:47:57 GMT
[Request Headers]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 147
Content-Type: application/x-www-form-urlencoded
Cookie: pcah=Q3BLRnpDUGRhcnJFMmg1OGI0LzBrLzNhYWM5cjBVV2IK
Host: ams.targetsite.com
Origin: https://www.targetsite.com
Referer: https://www.targetsite.com/auth.form?N2dFMUIwaGFIc1BWQ3BoRTd2NVBWayt5ZE91UnZsa2xCcmNUU1VtVG8yNW54WUhjNFBYblE3STJwK2xrRWhNawpNRWtmMjJtUFF2Y0xSL2t1N2xIc2pmSk4wZG5uRVdmbkEyRUpxdnVDODI4UmVhMjlId1h6dVZIeFRtWGZuUGd5CkoySHEwZXg5RnRVPQo=
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36
[Form-Data]
rlm: RESTRICTED
for: http%3a%2f%2fwww%2etargetsite%2ecom%2fmembers%2findex%2ephp
rmb: y
uid: user
pwd: pass
这是我为发出该发布请求而编写的代码:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Origin': 'https://www.targetsite.com',
'Referer': segundaUrl,
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
body = {
'rlm':'RESTRICTED',
'for':'http://www.targetsite.com/members/index.php',
'rmb': 'y',
'uid': 'user',
'pwd': 'pass'
}
#session.post(url)
r = session.post(urlPost, headers=headers, data=body)
所有人都说有谁可以帮助我解决这个问题?预先感谢!
编辑。:按要求提供完整代码:
#Help
#http://kazuar.github.io/scraping-tutorial/
#Libraries
import requests
import json
from lxml import html
#URL
primeiraUrl = 'https://www.targetsite.com/members/login.php'
urlPost = 'https://ams.targetsite.com/auth.form'
#Credentials
userd = 'user'
passwd = 'pass'
session = requests.Session()
#session.verify = False
#GetToken
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Referer': 'http://www.targetsite.com/tour/index.php',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
get1stContact = session.get(primeiraUrl,headers=headers)
segundaUrl = get1stContact.url
get2ndContact = session.get(segundaUrl,headers=headers)
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Origin': 'https://www.targetsite.com',
'Referer': segundaUrl,
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
body = {
'rlm':'RESTRICTED',
'for':'http://www.targetsite.com/members/index.php',
'rmb': 'y',
'uid': 'user',
'pwd': 'pass'
}
#session.post(url)
r = session.post(urlPost, headers=headers, data=body)
print(r.text)
答案 0 :(得分:0)
您缺少发送一些标头的方法,最重要的是:
Content-Type: application/x-www-form-urlencoded
这很重要,因为它告诉服务器如何解析表单中发送的参数。