python recursive scrapy不是所有页面

时间:2014-12-14 19:16:19

标签: python json scrapy

我试图为viagogo写蜘蛛。每当我在这个页面(例如):http://www.viagogo.com/Concert-Tickets/Rock-and-Pop 我没有看到所有的节目,我需要点击“下一步”以获得其他结果。我打开了wireshake,看到这是一个带有{“method”:“GetGridData”}的JSON到同一个地址。我试图通过scrapy获得所有结果,但我总是得到第一个结果。这是我的代码:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from viagogo.items import ViagogoItem
from scrapy.http import Request

class viagogoSpider(CrawlSpider):
    name="viagogo"
    allowed_domains=['viagogo.com']
    start_urls = ["http://www.viagogo.com/Concert-Tickets"]
    rules = (
        # Running on each subject in title, such as Rock in music
        Rule(SgmlLinkExtractor(restrict_xpaths=('//a[@class="t xs"]')), callback='Parse_Subject_Tickets', follow=True),
    )

    def Parse_Subject_Tickets(self, response):
        item = ViagogoItem()
        item["title"] = response.xpath('//title/text()').extract()
        item["link"] = response.url
        yield Request(response.url,  callback =self.Parse_artists_Tickets, meta={"method":"GetGridData"}, dont_filter=True)

    def Parse_artists_Tickets(self, response):
        print response.body

在规则中获取所有Concert-Tickets / XXXX页面,并在Parse_Subject_Tickets中尝试构建JSON,但在Parse_artists_Tickets中打印后页面完全是原始页面,而不是新艺术家... < / p>

任何想法?

谢谢!

0 个答案:

没有答案