这是我抓取过程的起点。
https://www.storiaimoveis.com.br/alugar/brasil
这是AJAX调用,它以JSON格式返回每个页面的数据。
我的POST请求失败,错误404。这些请求需要有效负载,这给我带来了麻烦。我总是以某种方式解决了这个问题,但是现在我试图了解我对他们的错。
我的问题是;
json.dumps(payload)
还是将其作为字典发送之前,我需要打给他们吗?这是我代码的相关部分。
class MySpider(CrawlSpider):
name = 'myspider'
start_urls = [
'https://www.storiaimoveis.com.br/api/search?fields=%24%24meta.geo.postalCodeAddress.city%2C%24%24meta.geo.postalCodeAddress.neighborhood%2C%24%24meta.geo.postalCodeAddress.street%2C%24%24meta.location%2C%24%24meta.created%2Caddress.number%2Caddress.postalCode%2Caddress.neighborhood%2Caddress.state%2Cmedia%2ClivingArea%2CtotalArea%2Ctypes%2Coperation%2CsalePrice%2CrentPrice%2CnewDevelopment%2CadministrationFee%2CyearlyTax%2Caccount.logoUrl%2Caccount.name%2Caccount.id%2Caccount.creci%2Cgarage%2Cbedrooms%2Csuites%2Cbathrooms%2Cref&optimizeMedia=true&size=20&from=0&sessionId=5ff29d7e-88d0-54d5-2641-e203cafd6f4e'
]
page = 1
payload = {"locations":[{"geo":{"top_left":{"lat":5.2717863,
"lon":-73.982817},
"bottom_right":{"lat":-34.0891,
"lon":-28.650543}},
"placeId":"ChIJzyjM68dZnAARYz4p8gYVWik",
"keywords":"Brasil",
"address":{"label":"Brasil","country":"BR"}}],
"operation":["RENT"],
"bathrooms":[],
"bedrooms":[],
"garage":[],
"features":[]}
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'Referer': 'https://www.storiaimoveis.com.br/alugar/brasil',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'
}
def parse(self, response):
for url in self.start_urls:
yield scrapy.Request(url=url,
method='POST',
headers=self.headers,
body=json.dumps(self.payload),
callback=self.parse_items)
def parse_items(self, response):
from scrapy.shell import inspect_response
inspect_response(response, self)
print response.text
答案 0 :(得分:1)
是的,您需要调用DailyFieldRecordID
,因为请求主体必须是 DailyFieldRecord: AB953:
DailyFieldRecordID ActivityCodeID DailyFieldRecordID: ItemID: GroupID:
657 387 657 1305 210
888 420 657 1333 260
672 387 657 1335 260
657 1302 210
657 1334 260
657 1111 111
888 1302 210
888 1336 260
672 1327 260
672 1334 260
672 1335 260
672 1322 260
672 1222 420
Expected Output:
Count1: Count2:
4 3
Count1 is supposed to count: Count2 is supposed to count:
672 1327 260 657 1333 260
672 1334 260 657 1335 260
672 1335 260 657 1334 260
672 1322 260
Current Count:
Count1: Count2:
4 6
SELECT sum(CASE WHEN ex=0 THEN 1 ELSE 0 END) AS COUNT1,sum(EX) AS COUNT2
FROM AB953 ab
JOIN DailyFieldRecord dfr
ON dfr.DailyFieldRecordID = ab.DailyFieldRecordID
JOIN ( SELECT AB1.DailyFieldRecordID,sum(CASE WHEN AB1.ItemID IN
(1302,1303,1305,1306) THEN 1 ELSE 0 END) AS EX
FROM AB953 AB1
GROUP BY AB1.DailyFieldRecordID) T
ON dfr.DailyFieldRecordID = T.DailyFieldRecordID
WHERE dfr.ActivityCodeID = 387
AND ab.GroupID = 260
,如文档中所述:https://docs.scrapy.org/en/latest/topics/request-response.html#request-objects
但是,在您的情况下,由于缺少这两个标头:#include <stdio.h>
void main() {
int k=8;
int m=7;
k<m ? k=k+1 : m=m+1;
printf("%d",k);
}
和json.dumps(payload)
,您的请求失败。
为了获得正确的请求标头,我通常要做的是:
str or unicode
或Content-Type
发出请求,直到获得正确的标题为止。在这种情况下,对于HTTP 200响应状态,Referer
和curl
似乎足够了: