我需要向此网址发送帖子请求:
http://lastsecond.ir/hotels/ajax
您可以在此处查看此请求发送的其他参数:
formdata:
filter_score:
sort:reviewed_at
duration:0
page:1
base_location_id:1
request header:
:authority:lastsecond.ir
:method:POST
:path:/hotels/ajax
:scheme:https
accept:*/*
accept-encoding:gzip, deflate, br
accept-language:en-US,en;q=0.9,fa;q=0.8,ja;q=0.7
content-length:67
content-type:application/x-www-form-urlencoded; charset=UTF-8
cookie:_jsuid=2453861291; read_announcements=,11,11; _ga=GA1.2.2083988810.1511607903; _gid=GA1.2.1166842676.1513922852; XSRF-TOKEN=eyJpdiI6IlZ2TklPcnFWU3AzMlVVa0k3a2xcL2dnPT0iLCJ2YWx1ZSI6ImVjVmt2c05STWRTUnJod1IwKzRPNk4wS2lST0k1UTk2czZwZXJxT2FQNmppNkdUSFdPK29kU29RVHlXbm1McTlFSlM5VlIwbGNhVUozbXFBbld5c2tRPT0iLCJtYWMiOiI4YmNiMGQwMzdlZDgyZTE2YWNlMWY1YjdmMzViNDQwMmRjZGE4YjFmMmM1ZmUyNTQ0NmE1MGRjODFiNjMwMzMwIn0%3D; lastsecond-session=eyJpdiI6ImNZQjdSaHhQM1lZaFJIZzhJMWJXN0E9PSIsInZhbHVlIjoiK1NWdHJiUTdZQzBYeEsyUjE3QXFhUGJrQXBGcExDMVBXTjhpSVJLRlFnUjVqXC9USHBxNGVEZ3dwKzVGcG5yeU93VTZncG9wRGpvK0VpVnQ2b1ByVnh3PT0iLCJtYWMiOiI4NTFkYmQxZTFlMTMxOWFmZmU1ZjA1ZGZhNTMwNDFmZmU0N2FjMGVjZTg1OGU2NGE0YTNmMTc2MDA5NWM1Njg3In0%3D
origin:https://lastsecond.ir
referer:https://lastsecond.ir/hotels?score=&page=1&sort=reviewed_at&duration=0
user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36
x-csrf-token:oMpQTG0wN0YveJIk2WhkesvzjZE2FqHkDqPiW8Dy
x-requested-with:XMLHttpRequest
此代码的结果假设是一个json文件,它将请求重定向到其父URL。我使用scrapy和python发送此请求,这里是scrapy代码:
class HotelsSpider(scrapy.Spider):
name = 'hotels'
allowed_domains = ['lastsecond.ir']
start_urls = ['http://lastsecond.ir/hotels']
def parse(self, response):
data = {
'filter_score': '',
'sort': 'reviewed_at',
'duration': '0',
'page': '1',
'base_location_id': '1'
}
headers = {
'user-agent': 'Mozilla/5.0',
'x-csrf-token': 'oMpQTG0wN0YveJIk2WhkesvzjZE2FqHkDqPiW8Dy',
'x-requested-with': 'XMLHttpRequest'
}
url = 'https://lastsecond.ir/hotels/ajax'
return FormRequest(
url=url,
callback=self.parse_details,
formdata=data,
method="POST",
headers=headers,
dont_filter=True
)
def parse_details(self, response):
data = response.body_as_unicode()
print(data)
#f = open('output.json', 'w')
#f.write(data)
#f.close()
我已经更改了我的代码,因此每次发送请求时都会获得新的csrf-token:
class HotelsSpider(scrapy.Spider):
name = 'hotels'
allowed_domains = ['lastsecond.ir']
start_urls = ['http://lastsecond.ir/hotels']
def parse(self, response):
html = response.body_as_unicode()
start = html.find("var csrftoken = '")
start = start + len(b"var csrftoken = '")
end = html.find("';", start)
self.csrftoken = html[start:end]
print('csrftoken:', self.csrftoken)
yield self.ajax_request('1')
def ajax_request(self, page):
data = {
'filter_score': '',
'sort': 'reviewed_at',
'duration': '0',
'page': page,
'base_location_id': '1'
}
headers = {
'user-agent': 'Mozilla/5.0',
'x-csrf-token': self.csrftoken,
'x-requested-with': 'XMLHttpRequest'
}
url = 'https://lastsecond.ir/hotels/ajax'
return FormRequest(
url=url,
callback=self.parse_details,
formdata=data,
method="POST",
headers=headers,
dont_filter=True
)
def parse_details(self, response):
print(response.body_as_unicode())
任何帮助将不胜感激。
答案 0 :(得分:0)
您是否提出非法请求?学习它的最简单方法是将请求在浏览器中复制为curl(指定请求F12 -> Network -> Right Click
上的-> Copy -> Copy as Curl
),并使用this tool(不使用Scrapy)将其转换为python语言
答案 1 :(得分:0)
您的错误与每个请求中的'x-csrf-token'
相同。
'x-csrf-token'
是阻止机器人/脚本的方法。
维基百科:Cross Site Request Forgery
每次在浏览器门户中打开页面时,都会生成新的,uniqe 'x-csrf-token'
,这只能在短时间内正确显示。您不能一直使用相同的'x-csrf-token'
。
在回答上一个问题时,我提出GET
请求获取页面并找到新的X-CSRF-TOKEN
。
请参阅代码中的self.csrftoken
def parse(self, response):
print('url:', response.url)
html = response.body_as_unicode()
start = html.find("var csrftoken = '")
start = start + len(b"var csrftoken = '")
end = html.find("';" , start)
self.csrftoken = html[start:end]
print('csrftoken:', self.csrftoken)
yield self.create_ajax_request('1')
后来我使用这个令牌来读取AJAX请求。
def create_ajax_request(self, page):
'''
subfunction can't use `yield, it has to `return` Request to `parser`
and `parser` can use `yield`
'''
print('yield page:', page)
url = 'https://lastsecond.ir/hotels/ajax'
headers = {
'X-CSRF-TOKEN': self.csrftoken,
'X-Requested-With': 'XMLHttpRequest',
}
params = {
'filter_score': '',
'sort': 'reviewed_at',
'duration': '0',
'page': page,
'base_location_id': '1',
}
return scrapy.FormRequest(url,
callback=self.parse_details,
formdata=params,
headers=headers,
dont_filter=True,
)