我首先使用硒登录,然后将cookie传递到请求会话:
from wsm_login import wsm_login
import requests
from lxml import html
username="username"
password="password"
port=9150
url="http://wallst4qihu6lvsa.onion/login"
cookies=wsm_login(username=username,password=password,url=url,port=port)
session = requests.session()
session.proxies = {}
session.proxies['http'] = 'socks5h://localhost:9150'
session.proxies['https'] = 'socks5h://localhost:9150'
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0"
}
session.headers.update(headers)
for cookie in cookies:
c = {cookie['name']: cookie['value']}
session.cookies.update(c)
我想浏览网站上的页面,但是要做到这一点,我必须与表单交互以按下按钮来浏览类别/页面。 手动单击其中一个类别后,这是我从firefox复制的POST请求的标头和参数:
POST /index HTTP/1.1
Host: wallst4qihu6lvsa.onion
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://wallst4qihu6lvsa.onion/index
Content-Type: application/x-www-form-urlencoded
Content-Length: 315
Cookie: PHPSESSID=o1gro0908lrel3s1lb7q3k90mn
Connection: keep-alive
Upgrade-Insecure-Requests: 1
form[_token]=AZUhb0nkdd3f_KnzzwKw8tRnime99UvHZoMGwZhBOzk
form[catT]
form[catM]
form[catB]
form[searchTerm]
menuCatT=1
form[limit]=15
form[rating]
form[vendorLevel]
form[vendoractivity]=0
form[quantity]
form[maxpricepunit]
form[shipsfrom]=0
form[shipsto]=0
我使用以下代码成功地在python中复制了手动操作:
url = "http://wallst4qihu6lvsa.onion/index"
page_content = session.get(url)
tree = html.fromstring(page_content.text)
token = tree.xpath("//input[@id='form__token']")[0].get("value")
PHPSESSID = session.cookies['PHPSESSID']
headers['Connection'] = "keep-alive"
headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0"
headers['Accept'] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
headers['Accept-Language'] = "en-US,en;q=0.5"
headers['Accept-Encoding'] = "gzip, deflate"
headers['Referer'] = "http://wallst4qihu6lvsa.onion/index"
headers['Content-Type'] = "application/x-www-form-urlencoded"
headers['Content-Length'] = "315"
headers['Upgrade-Insecure-Requests'] = "1"
headers['PHPSESSID'] = PHPSESSID
payload = {
'_token': token,
'catT': "",
'catM': "",
'catB': "",
'searchTerm':"",
'limit': "15",
'rating': "",
'vendorLevel': "",
'vendoractivity': "0",
'quantity': "",
'maxpricepunit': "",
'shipsfrom': "0",
'shipsto': "0",
'menuCatT': "1"
}
content = session.post(url=url, headers=headers, data=payload)
导航到该类别后,将出现一些按钮,可让我浏览页面。这是在firefox中手动完成后从POST请求复制并粘贴的内容:
POST /index HTTP/1.1
Host: wallst4qihu6lvsa.onion
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://wallst4qihu6lvsa.onion/index
Content-Type: application/x-www-form-urlencoded
Content-Length: 357
Cookie: PHPSESSID=o1gro0908lrel3s1lb7q3k90mn
Connection: keep-alive
Upgrade-Insecure-Requests: 1
form[_token]=AZUhb0nkdd3f_KnzzwKw8tRnime99UvHZoMGwZhBOzk
form[catT]=1
form[catM]=0
form[catB]=0
form[searchTerm]
form[limit]=15
form[rating]=0
form[vendorLevel]=1
form[vendoractivity]=0
form[quantity]=0
form[maxpricepunit]=0
form[shipsfrom]=0
form[shipsto]=0
form[sort]=pop_week_desc
form[page]=2
我无法重复此操作。响应仅使我返回到原始索引页面。这是我的尝试(我在上一段代码之后立即运行):
headers['Content-Length'] = "357"
payload = {
'_token': token,
'catT': "1",
'catM': "0",
'catB': "0",
'searchTerm':"",
'limit': "15",
'rating': "0",
'vendorLevel': "1",
'vendoractivity': "0",
'quantity': "0",
'maxpricepunit': "0",
'shipsfrom': "0",
'shipsto': "0",
'page': '2',
'sort': "pop_week_desc"
}
content = session.post(url=url, headers=headers, data=payload)