GET请求返回不同的JSON内容

时间:2017-08-14 07:11:24

标签: get scrapy

我正在使用Scrapy抓取一些数据。每次我在浏览器上打开product detail并检查浏览器请求的this request时,总是返回相同的正确内容而不是字符'?????'
但是,如果我在浏览器上打开上面的请求,那么它返回正确的内容大约10次。然后,它通过添加字符'?????'返回错误的内容 你能解释一下为什么会出现这个问题吗?如何让Scrapy充当真正的浏览器? 这是正确的内容

{"itemid": 43369300, "liked": false, "offer_count": 6, "videos": [], "image": "41dabd8fe9b7cbc2ab30501592f65a80", "image_list": ["41dabd8fe9b7cbc2ab30501592f65a80", "91bf75885fffd2b1fbcc55099457bc22", "f4516bb9667f8329f031ff75896a71fd", "d2639a1ffe75912873de6d8e011dc0dd", "38d00637b021e1701542a6afa7ae58f3", "10ab99e3bd211bd4dd63993555d6454b"].....

这是错误的内容

{"itemid": 43369300, "liked": false, "offer_count": 10, "videos": [], "rating_star": 4.069458216402549, "image": "41dabd8fe9?????????????????????", "image_list": ["41dabd8fe9?????????????????????", "91bf75885f?????????????????????", "f4516bb966?????????????????????", "d2639a1ffe?????????????????????", "38d00637b0?????????????????????", "10ab99e3bd?????????????????????"].....

您可以使用其他请求request1request2,...

进行测试

1 个答案:

答案 0 :(得分:0)

问题可能是因为您直接使用API​​并且它们正在阻止抓取。如果我使用curl和额外的标题10-15次点击下面的URL,它可以正常工作

curl 'https://xxxx.vn/api/v0/shop/6088300/item/43369300/shipping_info_to_address/?state=H%C3%A0%20N%E1%BB%99i&city=Huy%E1%BB%87n%20Ba%20V%C3%AC&district=' \
-H 'Pragma: no-cache' \
-H 'DNT: 1' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Accept-Language: en-US,en;q=0.8' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36' \
-H 'X-API-SOURCE: pc' \
-H 'Accept: */*' \
-H 'Cache-Control: no-cache' \
-H 'X-Requested-With: XMLHttpRequest'  \
-H 'Referer: https://xxx.vn/H%E1%BB%99p-%C4%91%E1%BB%B1ng-gi%C3%A0y-trong-su%E1%BB%91t-theo-d%C3%B5i-c%C3%B3-gi%C3%A1-t%E1%BB%91t-i.6088300.43369300' \
--compressed

所以我认为你应发送的4个重要标题位于

之下
'X-Requested-With: XMLHttpRequest'
'X-API-SOURCE: pc'
'Referer: https://xxx.vn/H%E1%BB%99p-%C4%91%E1%BB%B1ng-gi%C3%A0y-trong-su%E1%BB%91t-theo-d%C3%B5i-c%C3%B3-gi%C3%A1-t%E1%BB%91t-i.6088300.43369300'
'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36'

在创建Scrapy中的请求时发送这些标题