我是新手,所以我不知道为什么我会遇到这个问题。我试图从anyvan.com刮掉客户供应商的聊天记录。该网站的正常作业页面如this。单击出价会话中的粉红色视图按钮会发送一个ajax请求,然后加载聊天记录。可以在开发人员工具中看到此XHR请求 - >网络 - >过滤XHR请求。
我使用以下简单的蜘蛛来模拟使用scrapy的请求,但似乎我被重定向到anyvan.com
class AVSpider(Spider):
name = "anyvanscraper"
allowed_domains = ["anyvan.com"]
# This start URL is the job URL
start_urls = ["http://www.anyvan.com/view-listing/1935650"]
def parse(self, response):
# This receives the response from the start url. But we don't do anything with it.
url = 'http://www.anyvan.com/ajax-bid-comment/bid/14916780'
return Request('http://www.anyvan.com/ajax-bid-comment/bid/14916780' , callback=self.parse_stores)
def parse_stores(self, response):
y = response.body
f = open('html.txt','w')
f.write(BeautifulSoup(y).prettify().encode('utf-8'))
提前致谢 埃伦
答案 0 :(得分:2)
添加此标题。您可以将其添加到请求中。
"X-Requested-With": "XMLHttpRequest"
这样的事情应该有效:
return Request('http://www.anyvan.com/ajax-bid-comment/bid/14916780' , callback=self.parse_stores, headers={"X-Requested-With": "XMLHttpRequest"})