我想将参数传递给我的蜘蛛,以便根据输入搜索网站,但我很难设置实例变量。似乎 init 被调用两次,第一次使用我传递的参数,第二次似乎被scrapy函数调用,它不会传递我的输入并重置self.a和self.b为默认值:'f'。
我在另一个post上读到,scrapy会自动将任何传递的变量设置为实例属性,但我还没有找到访问它们的方法。
对此有解决方案,还是我想念的更简单方法?
import scrapy
from scrapy_splash import SplashRequest
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
class PracticeSpider(scrapy.Spider):
name = 'practice'
def __init__(self, a='f', b='f' *args, **kwargs):
super(PracticeSpider, self).__init__(*args, **kwargs)
self.a = a
self.b = b
print self.a
print self.b
def start_requests(self):
print self.a
print self.b
yield SplashRequest(''.join(["https://www.google.com/search?q=",
self.a, "+", self.b]), self.practice_parse, args={'wait': 0.5})
def practice_parse(self):
pass
# list of crawlers
TO_CRAWL = [PracticeSpider]
# crawlers that are running
RUNNING_CRAWLERS = []
for spider in TO_CRAWL:
process = CrawlerProcess(get_project_settings())
for spider in TO_CRAWL:
process.crawl(spider(a='first', b='second'))
process.start()
答案 0 :(得分:1)
您可能需要查看 meta 参数,这是一个字典:
def some_function(self, response):
...
yield Request(url=page,
callback=self.parse_page,
meta = {'var1' : "value1", 'var1' : "value2})
然后,在parse_page函数中,您可以按如下方式检索变量:
def parse_page(self, response):
...
var1 = response.meta["var1"]
var2 = response.meta["var2"]