Scrapy - 将蜘蛛导入模块

时间:2016-09-23 15:54:41

标签: scrapy scrapy-spider

我正在尝试从python脚本中部署蜘蛛。

这是我的文件夹结构:

scraping.py
blogs/
     __init_.py
     blogs/
     scrapy.cfg
          __init_.py
          items.py
          settings.py
          spiders/
                 __init__.py
                 spider_blog.py

这是测试scraping.py代码段(它属于class):

def spider_blog(self):

    parse()

这就是我在spider_blog.py中所拥有的:

class PitchforkSpider(scrapy.Spider):
    name = "pitchfork_reissues"
    allowed_domains = ["pitchfork.com"]
    #creates objects for each URL listed here
    start_urls = [
                    "http://pitchfork.com/reviews/best/reissues/?page=1",
                    "http://pitchfork.com/reviews/best/reissues/?page=2",
                    "http://pitchfork.com/reviews/best/reissues/?page=3",
    ]


    def parse(self, response):

        for sel in response.xpath('//div[@class="album-artist"]'):
            item = PitchforkItem()
            item['artist'] = sel.xpath('//ul[@class="artist-list"]/li/text()').extract()
            item['reissue'] = sel.xpath('//h2[@class="title"]/text()').extract()

        yield item

我是如何将parse()作为spider_blog.py导入modulescraping.py的正确语法?

0 个答案:

没有答案