Question

使用Scrapy我想要做的是通过跟踪webpage中的所有应用链接来抓取每个链接中的内容。

一般两个步骤：

获取所有链接并提取其中的每一个
按照链接获取内容

问题是那些链接是基于ajax的。所以我使用了＆＃39; formdata＆＃39;模拟xhr请求并将这些链接存储在项目中。

我已经完成了大部分代码，现在能够获得所有链接，感谢有人帮助我here

但是，当我尝试关注链接以获取更多内容时，一切都会出错。

这是我的代码。

def start_request(self,response):
    for i in range(0,10): 
        yield scrapy.Request(url="https://play.google.com/store/apps/category/GAME/collection/topselling_new_free?authuser=0", method="POST", formdata={'start':str(i*60),'num':'60','numChildren':'0','ipf':'1','xhr':'1','token':'m1VdlomIcpZYfkJT5dktVuqLw2k:1455483261011'}, callback=self.parse)

def parse(self,response):
    item = googleAppItem()   
    map = {}
    links = response.xpath('//a/@href').re(r'/store/apps/details.*')
    for l in links:
        if l not in map:
            map[l] = True
            item['url']=l
            l = "http://play.google.com"+str(l)
            request = scrapy.Request(l,callback=self.parse_every_app)
            request.meta['item'] = item
            return request
def parse_every_app(self,response):

有人可以帮助我吗？

这是运行代码出错时cmd的屏幕截图。

关注网页中的ajax内容并继续抓取

0 个答案: