Python-scrapy -sqlite3-pipelines打印出错误:无法创建对' NoneType'对象

时间:2016-11-16 07:44:20

标签: python sqlite scrapy weak-references

我开始使用scrapy来捕获一些web,并尝试使用scrapy中的管道保存数据保存到Sqlite3,但遇到了以下问题:你能帮忙检查一下吗? 我检查了self.initialize()的类型,它是NoneType,但是当我将类型更改为字符串或布尔值时,它仍然给出了samilar结果。 我不知道根本原因在哪里。 在最后一部分,我列出了weakref源代码供您参考:

     'scrapy.spidermiddlewares.depth.DepthMiddleware']
*Unhandled error in Deferred:
2016-11-16 07:25:49 [twisted] CRITICAL: Unhandled error in Deferred:
2016-11-16 07:25:49 [twisted] CRITICAL:
Traceback (most recent call last):
  File "C:\Anaconda2\lib\site-packages\twisted\internet\defer.py", line 1260, in _inlineCallbacks
    result = g.send(result)
  File "C:\Anaconda2\lib\site-packages\scrapy\crawler.py", line 90, in crawl
    six.reraise(*exc_info)
  File "C:\Anaconda2\lib\site-packages\scrapy\crawler.py", line 72, in crawl
    self.engine = self._create_engine()
  File "C:\Anaconda2\lib\site-packages\scrapy\crawler.py", line 97, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "C:\Anaconda2\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
    self.scraper = Scraper(crawler)
  File "C:\Anaconda2\lib\site-packages\scrapy\core\scraper.py", line 71, in __init__
    self.itemproc = itemproc_cls.from_crawler(crawler)
  File "C:\Anaconda2\lib\site-packages\scrapy\middleware.py", line 58, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "C:\Anaconda2\lib\site-packages\scrapy\middleware.py", line 40, in from_settings
    mw = mwcls()
  File "C:\Anaconda2\log\Spider\tutorial\tutorial\pipelines.py", line 16, in __init__
    dispatcher.connect(self.initialize(),signals.engine_started)
  File "C:\Anaconda2\lib\site-packages\scrapy\xlib\pydispatch\dispatcher.py", line 144, in connect
    receiver = saferef.safeRef(receiver, onDelete=_removeReceiver)
  File "C:\Anaconda2\lib\site-packages\scrapy\xlib\pydispatch\saferef.py", line 28, in safeRef
    return weakref.ref(target, onDelete)
TypeError: cannot create weak reference to 'NoneType' object*

管道的源代码如下:

import sqlite3
from os import path
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher
class TutorialPipeline(object):
    filename='QiuShiBaiKe.db'
    def __init__(self):
        self.conn=None
        dispatcher.connect(self.initialize(),signals.engine_started)
        dispatcher.connect(self.finalize(),signals.engine_stopped)
    def process_item(self,item,spider):
        self.conn= sqlite3.connect(self.filename)
        self.conn.execute('insert into company values(?,)',(item['content']))
        self.conn.commit()
        self.conn.close()
        return item
    def initialize(self):
        if path.exists(self.filename):
            self.conn= sqlite3.connect(self.filename)
        else:
            self.conn=self.create_table(self.filename)
    def finalize(self):
        if self.conn is not None:
            self.conn.close()
            self.conn= None
    def create_table(self,filename):

        conn=sqlite3.connect(filename)
        conn.execute('''create table company(content text NOT NULL)''')
        conn.commit()
        return conn

以下部分代码来自scrapy libary,错误打印出来的是源代码的这一部分。

def safeRef(target, onDelete=None):
"""Return a *safe* weak reference to a callable target

target -- the object to be weakly referenced, if it's a
    bound method reference, will create a BoundMethodWeakref,
    otherwise creates a simple weakref.
onDelete -- if provided, will have a hard reference stored
    to the callable to be called after the safe reference
    goes out of scope with the reference object, (either a
    weakref or a BoundMethodWeakref) as argument.
"""
if hasattr(target, 'im_self'):
    if target.im_self is not None:
        # Turn a bound method into a BoundMethodWeakref instance.
        # Keep track of these instances for lookup by disconnect().
        assert hasattr(target, 'im_func'), """safeRef target %r has im_self, but no im_func, don't know how to create reference"""%( target,)
        reference = BoundMethodWeakref(
            target=target,
            onDelete=onDelete
        )
        return reference
if onDelete is not None:
    return weakref.ref(target, onDelete)
else:
    return weakref.ref(target)

1 个答案:

答案 0 :(得分:0)

使用信号调度程序,您必须.connect()使用方法,而不是方法调用,即没有()

class TutorialPipeline(object):
    filename='QiuShiBaiKe.db'
    def __init__(self):
        self.conn=None
        dispatcher.connect(self.initialize, signals.engine_started)
        dispatcher.connect(self.finalize, signals.engine_stopped)

请注意,良好做法是使用the more scrapy-idomatic .from_crawler() way注册信号。类似的东西:

from scrapy import signals

class TutorialPipeline(object):
    filename='QiuShiBaiKe.db'
    def __init__(self):
        self.conn=None

    @classmethod
    def from_crawler(cls, crawler):
        pipe = cls()
        crawler.signals.connect(pipe.initialize,
            signal=signals.engine_started)
        crawler.signals.connect(pipe.finalize,
            signal=signals.engine_stopped)
        return pipe
    ...

此外,如果您在引擎启动时初始化sqlite3.connect/close,我认为您不需要为每个项目调用self.conn