我正在尝试从python
脚本中部署蜘蛛。
这是我的文件夹结构:
scraping.py
blogs/
__init_.py
blogs/
scrapy.cfg
__init_.py
items.py
settings.py
spiders/
__init__.py
spider_blog.py
这是测试scraping.py
代码段(它属于class
):
def spider_blog(self):
parse()
这就是我在spider_blog.py
中所拥有的:
class PitchforkSpider(scrapy.Spider):
name = "pitchfork_reissues"
allowed_domains = ["pitchfork.com"]
#creates objects for each URL listed here
start_urls = [
"http://pitchfork.com/reviews/best/reissues/?page=1",
"http://pitchfork.com/reviews/best/reissues/?page=2",
"http://pitchfork.com/reviews/best/reissues/?page=3",
]
def parse(self, response):
for sel in response.xpath('//div[@class="album-artist"]'):
item = PitchforkItem()
item['artist'] = sel.xpath('//ul[@class="artist-list"]/li/text()').extract()
item['reissue'] = sel.xpath('//h2[@class="title"]/text()').extract()
yield item
我是如何将parse()
作为spider_blog.py
导入module
到scraping.py
的正确语法?