我正在尝试创建一个通用蜘蛛来处理最常见的任务和特定的蜘蛛,这些蜘蛛继承了通用任务并声明了网站特定的变量。
有genericspider.py
:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import Spider, CrawlSpider
class GenericProductSpider(scrapy.Spider):
def __init__(self, start_urls=[], finditemprop='', keywords='', **kwargs):
CrawlSpider.__init__(self, **kwargs)
print ( "\n\n Init Generic \n" )
然后我将specificspider.py
放在与通用目录相同的目录中。
# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import Spider, CrawlSpider
from .genericfabric import GenericFabricsSpider
class SpecificSpider(GenericProductSpider):
def __init__(self, **kwargs):
print ( "\n init specific \n" )
name = "specific1"
start_urls = ['http://www.specificdomian.com',]
super(SpecificSpider, self).__init__(name, start_urls, **kwargs)
我似乎对如何正确调用超类的初始化程序有了解。我收到了各种错误消息,但通用蜘蛛的 init 方法从未被执行过。
答案 0 :(得分:0)
实际上......似乎工作正常 - 可能只是参数问题。
超类的工作代码:
# -*- coding: utf-8 -*-
from scrapy.spiders import Spider
from test.items import TestItem
class TestsuperSpider(Spider):
name = "testsuper"
allowed_domains = ["craigslist.org"]
start_urls = ["http://sfbay.craigslist.org/search/npo"]
supervar = "meine super var"
def __init__(self):
print ( "super init" )
def parse(self, response):
print ( "super Parse" )
def supermethod ( self, subvar ):
print ( "\n\n Supermethod \n\n " )
print ( self.supervar + " - " + subvar )
子类:
# -*- coding: utf-8 -*-
from scrapy.spiders import Spider
from test.items import TestItem
from test.spiders.testsuper import TestsuperSpider
class TestsubSpider(TestsuperSpider):
name = "testsub"
allowed_domains = ["craigslist.org"]
start_urls = ["http://sfbay.craigslist.org/search/npo"]
subvar = "subvar"
def __init__(self):
print ( "sub init" )
super(TestsubSpider, self).__init__()
def parse(self, response):
super(TestsubSpider, self).supermethod(self.subvar)
print ( "sub Parse" )
现在它需要清理并调整它的目的,但至少代码按预期运行。