我使用python 3.2和最新版本的Requests库。当我对登录端点执行首次HTTP POST时,我成功登录。现在,我想在登录后发布一些数据。当我执行第二篇文章时,我得到一个有效的答案(200 OK
),但是第二篇文章操作的响应重定向URL为example.com/index
。通常它应该是example.com/my-account
,但是当我刷新浏览器以查看该内容时,什么都没有改变。
这是我的代码:
import sys
import requests
from lxml import etree
from bs4 import BeautifulSoup
from furl import furl
#get user compenent
mail="thoryn.hiroshi@lcelandic.com"
password="333"
url = 'https://example.com/login.php'
#make the requests in one session
with requests.session() as client:
# Retrieve the CSRF(sid) token first
tree = etree.HTML(client.get(url).content)
csrf = tree.xpath('//input[@name="_sid"]/@value')[0]
#form data
formData = dict(_sid=csrf, email=mail,pwd=password,process="login")
#use same session client
r = client.post(url, data=formData)
idurl=r.url
print( r.request.headers)
g=client.get(idurl)
# this for scrape site html data, i guess isn't matter with requests problem
soup = BeautifulSoup(g.text, 'html.parser')
adispo=soup.find_all('a', attrs={"class":"dispo"})
i=0
datalist=[]
for i in range(0,len(adispo)):
linkajax=adispo[i]['onclick']
parsedlink= linkajax[30:247]
temp=furl(parsedlink)
data={
"timestamp":temp.args['timestamp'],
"skey":temp.args['skey'],
"id":temp.args['fg_id'],
"time":temp.args['result']
}
datalist.append(data)
# print(rdvlist,len(rdvlist))
getdata=datalist[0]
posturl='https://example.com/action.php'
datapayload={
"data name":"data"
}
secondpost=client.post(posturl,data=datapayload)
print(secondpost.reason, secondpost.url)
答案 0 :(得分:0)
如果您希望它是用于验证多个请求响应的可重用程序,则将需要对其进行结构化以处理对象。将此功能包装为三个功能,一个用于发送HTTP请求,一个用于处理HTTP响应,另一个将属性从HTTP响应DOM中删除。然后,您可以将函数包装到循环中,并从URL列表中多次执行主脚本。
这里是入门的好参考:https://eliasdorneles.github.io/2014/08/30/web-scraping-with-scrapy---first-steps.html