我正在构建一个带有可选登录的递归webspider。我想通过json配置文件使大多数设置动态化。
在我的UIButton
函数中,我正在阅读此文件并尝试填充所有变量,但是,这不适用于shiftButton.contentMode = .Center
shiftButton.imageView?.contentMode = .ScaleAspectFit
。
__init__
Scrapy仍会抓取Rules
中存在的网页,因此也会点击class CrawlpySpider(InitSpider):
...
#----------------------------------------------------------------------
def __init__(self, *args, **kwargs):
"""Constructor: overwrite parent __init__ function"""
# Call parent init
super(CrawlpySpider, self).__init__(*args, **kwargs)
# Get command line arg provided configuration param
config_file = kwargs.get('config')
# Validate configuration file parameter
if not config_file:
logging.error('Missing argument "-a config"')
logging.error('Usage: scrapy crawl crawlpy -a config=/path/to/config.json')
self.abort = True
# Check if it is actually a file
elif not os.path.isfile(config_file):
logging.error('Specified config file does not exist')
logging.error('Not found in: "' + config_file + '"')
self.abort = True
# All good, read config
else:
# Load json config
fpointer = open(config_file)
data = fpointer.read()
fpointer.close()
# convert JSON to dict
config = json.loads(data)
# config['rules'] is simply a string array which looks like this:
# config['rules'] = [
# 'password',
# 'reset',
# 'delete',
# 'disable',
# 'drop',
# 'logout',
# ]
CrawlpySpider.rules = (
Rule(
LinkExtractor(
allow_domains=(self.allowed_domains),
unique=True,
deny=tuple(config['rules'])
),
callback='parse',
follow=False
),
)
页面。因此,指定的页面不会被拒绝。我在这里缺少什么?
更新
我已尝试在config['rules']
内设置logout
和CrawlpySpider.rules = ...
。两种变体都不起作用。
self.rules = ...
__init__
我甚至试图在InitSpider
函数
LinkExtractor
答案 0 :(得分:0)
您正在设置要在其中设置实例属性的类属性:
# this:
CrawlpySpider.rules = (
# should be this:
self.rules = (
<...>