如何使用sitemaps spider抓取.xml.gz

时间:2017-07-07 16:43:58

标签: python-2.7 sitemap scrapy-spider

来自scrapy.spiders的

导入SitemapSpider

类MySpider(SitemapSpider):     NAME = “网站地图”     sitemap_urls = ['https://play.google.com/sitemaps/sitemaps-index-0.xml']

sitemap_rules =[
    ('/app/', 'parse_product'),
    ]               
def parse(self, response):

     yield  scrapy.Request(response.url,callback=self.parse_product)



def parse_product(self, response):

       yield {

          'applicationName' : (response.css('div.id-app-title ::text').extract_first()).encode("utf-8") ,
          'publishedBy' : (response.css('div a.document-subtitle.primary span::text').extract_first()).encode("utf-8") ,

           }

0 个答案:

没有答案