我一直致力于Scrapy项目,到目前为止一切都运作良好。但是,我对Scrapy的日志配置可能性不满意。目前,我在项目的LOG_FILE = 'my_spider.log'
中设置了settings.py
。当我在命令行上执行scrapy crawl my_spider
时,它会为整个爬网过程创建一个大日志文件。这对我来说是不可行的。
如何将Python的自定义日志处理程序与scrapy.log
模块结合使用?特别是,我想利用Python的logging.handlers.RotatingFileHandler
,这样我就可以将日志数据分成几个小文件而不必处理一个巨大的文件。不幸的是,Scrapy的测井设施的文件不是很广泛。提前谢谢了!
答案 0 :(得分:5)
您可以通过首先在scrapy.utils.log.configure_logging中禁用根句柄,然后添加自己的日志处理程序,来将所有scrapy日志记录到文件中。
在scrapy项目的settings.py文件中,添加以下代码:
import logging
from logging.handlers import RotatingFileHandler
from scrapy.utils.log import configure_logging
LOG_ENABLED = False
# Disable default Scrapy log settings.
configure_logging(install_root_handler=False)
# Define your logging settings.
log_file = '/tmp/logs/CRAWLER_logs.log'
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
rotating_file_log = RotatingFileHandler(log_file, maxBytes=10485760, backupCount=1)
rotating_file_log.setLevel(logging.DEBUG)
rotating_file_log.setFormatter(formatter)
root_logger.addHandler(rotating_file_log)
我们还根据需要自定义日志级别(从DEBUG到INFO)和格式化程序。 为了在蜘蛛网中添加自定义日志,管道我们可以像普通的python日志一样轻松地完成它,如下所示:
insidepipelines.py
import logging
logger = logging.getLogger()
logger.info('processing item')
希望这会有所帮助!
答案 1 :(得分:2)
您可以集成自定义日志文件(我不知道如何集成旋转器):
在你的蜘蛛类文件中:
from datetime import datetime
from scrapy import log
from scrapy.spider import BaseSpider
class ExampleSpider(BaseSpider):
name = "example"
allowed_domains = ["example.com"]
start_urls = ["http://www.example.com/"]
def __init__(self, name=None, **kwargs):
LOG_FILE = "scrapy_%s_%s.log" % (self.name, datetime.now())
# remove the current log
# log.log.removeObserver(log.log.theLogPublisher.observers[0])
# re-create the default Twisted observer which Scrapy checks
log.log.defaultObserver = log.log.DefaultObserver()
# start the default observer so it can be stopped
log.log.defaultObserver.start()
# trick Scrapy into thinking logging has not started
log.started = False
# start the new log file observer
log.start(LOG_FILE)
# continue with the normal spider init
super(ExampleSpider, self).__init__(name, **kwargs)
def parse(self, response):
...
输出文件可能如下所示:
scrapy_example_2012-08-25 12:34:48.823896.log
答案 2 :(得分:2)
Scrapy使用标准的python记录器,这意味着您可以在创建蜘蛛时抓取并修改它们。
import scrapy
import logging
from logging.handlers import RotatingFileHandler
Class SpiderSpider(scrapy.Spider):
name = 'spider'
start_urls = ['https://en.wikipedia.org/wiki/Spider']
handler = RotatingFileHandler('spider.log', maxBytes=1024, backupCount=3)
logging.getLogger().addHandler(handler)
def parse(self, response):
...