我试图使用Ajax从网站获取数据。页面加载,然后Javascript请求内容。有关详细信息,请参阅此页:https://www.tele2.no/mobiltelefon.aspx
问题在于,当我尝试通过调用此url来模拟此过程时: https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters
我收到400回复,告诉我该请求是不允许的。 这是我的代码:
# -*- coding: utf-8 -*-
import scrapy
import json
class Tele2Spider(scrapy.Spider):
name = "tele2"
#allowed_domains = ["tele2.no/mobiltelefon.aspx"]
start_urls = (
'https://www.tele2.no/mobiltelefon.aspx/',
)
def parse(self, response):
url = 'https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters'
my_data = "{filters: []}"
req = scrapy.Request( url, method='POST', body=json.dumps(my_data), headers={'X-Requested-With': 'XMLHttpRequest','Content-Type':'application/json'}, callback=self.parser2)
yield req
def parser2(self, response):
print "test"
我是scrapy和python的新手,所以可能会有一些显而易见的东西
答案 0 :(得分:3)
关键问题在于缺少身体filters
周围的引号:
url = 'https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters'
req = scrapy.Request(url,
method='POST',
body='{"filters": []}',
headers={'X-Requested-With': 'XMLHttpRequest',
'Content-Type': 'application/json; charset=UTF-8'},
callback=self.parser2)
yield req
或者,您可以将其定义为字典,然后调用json.dumps()
将其转储为字符串:
params = {"filters": []}
req = scrapy.Request(url,
method='POST',
body=json.dumps(params),
headers={'X-Requested-With': 'XMLHttpRequest',
'Content-Type': 'application/json; charset=UTF-8'},
callback=self.parser2)
作为证据,这是它在控制台上给我的东西:
2014-12-30 12:30:38-0500 [tele2] DEBUG: Crawled (200) <GET https://www.tele2.no/mobiltelefon.aspx/> (referer: None)
2014-12-30 12:30:42-0500 [tele2] DEBUG: Crawled (200) <POST https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters> (referer: https://www.tele2.no/mobiltelefon.aspx/)
test