我开始使用scrapy来捕获一些web,并尝试使用scrapy中的管道保存数据保存到Sqlite3,但遇到了以下问题:你能帮忙检查一下吗? 我检查了self.initialize()的类型,它是NoneType,但是当我将类型更改为字符串或布尔值时,它仍然给出了samilar结果。 我不知道根本原因在哪里。 在最后一部分,我列出了weakref源代码供您参考:
'scrapy.spidermiddlewares.depth.DepthMiddleware']
*Unhandled error in Deferred:
2016-11-16 07:25:49 [twisted] CRITICAL: Unhandled error in Deferred:
2016-11-16 07:25:49 [twisted] CRITICAL:
Traceback (most recent call last):
File "C:\Anaconda2\lib\site-packages\twisted\internet\defer.py", line 1260, in _inlineCallbacks
result = g.send(result)
File "C:\Anaconda2\lib\site-packages\scrapy\crawler.py", line 90, in crawl
six.reraise(*exc_info)
File "C:\Anaconda2\lib\site-packages\scrapy\crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "C:\Anaconda2\lib\site-packages\scrapy\crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "C:\Anaconda2\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
self.scraper = Scraper(crawler)
File "C:\Anaconda2\lib\site-packages\scrapy\core\scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "C:\Anaconda2\lib\site-packages\scrapy\middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "C:\Anaconda2\lib\site-packages\scrapy\middleware.py", line 40, in from_settings
mw = mwcls()
File "C:\Anaconda2\log\Spider\tutorial\tutorial\pipelines.py", line 16, in __init__
dispatcher.connect(self.initialize(),signals.engine_started)
File "C:\Anaconda2\lib\site-packages\scrapy\xlib\pydispatch\dispatcher.py", line 144, in connect
receiver = saferef.safeRef(receiver, onDelete=_removeReceiver)
File "C:\Anaconda2\lib\site-packages\scrapy\xlib\pydispatch\saferef.py", line 28, in safeRef
return weakref.ref(target, onDelete)
TypeError: cannot create weak reference to 'NoneType' object*
管道的源代码如下:
import sqlite3
from os import path
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher
class TutorialPipeline(object):
filename='QiuShiBaiKe.db'
def __init__(self):
self.conn=None
dispatcher.connect(self.initialize(),signals.engine_started)
dispatcher.connect(self.finalize(),signals.engine_stopped)
def process_item(self,item,spider):
self.conn= sqlite3.connect(self.filename)
self.conn.execute('insert into company values(?,)',(item['content']))
self.conn.commit()
self.conn.close()
return item
def initialize(self):
if path.exists(self.filename):
self.conn= sqlite3.connect(self.filename)
else:
self.conn=self.create_table(self.filename)
def finalize(self):
if self.conn is not None:
self.conn.close()
self.conn= None
def create_table(self,filename):
conn=sqlite3.connect(filename)
conn.execute('''create table company(content text NOT NULL)''')
conn.commit()
return conn
以下部分代码来自scrapy libary,错误打印出来的是源代码的这一部分。
def safeRef(target, onDelete=None):
"""Return a *safe* weak reference to a callable target
target -- the object to be weakly referenced, if it's a
bound method reference, will create a BoundMethodWeakref,
otherwise creates a simple weakref.
onDelete -- if provided, will have a hard reference stored
to the callable to be called after the safe reference
goes out of scope with the reference object, (either a
weakref or a BoundMethodWeakref) as argument.
"""
if hasattr(target, 'im_self'):
if target.im_self is not None:
# Turn a bound method into a BoundMethodWeakref instance.
# Keep track of these instances for lookup by disconnect().
assert hasattr(target, 'im_func'), """safeRef target %r has im_self, but no im_func, don't know how to create reference"""%( target,)
reference = BoundMethodWeakref(
target=target,
onDelete=onDelete
)
return reference
if onDelete is not None:
return weakref.ref(target, onDelete)
else:
return weakref.ref(target)
答案 0 :(得分:0)
使用信号调度程序,您必须.connect()
使用方法,而不是方法调用,即没有()
:
class TutorialPipeline(object):
filename='QiuShiBaiKe.db'
def __init__(self):
self.conn=None
dispatcher.connect(self.initialize, signals.engine_started)
dispatcher.connect(self.finalize, signals.engine_stopped)
请注意,良好做法是使用the more scrapy-idomatic .from_crawler()
way注册信号。类似的东西:
from scrapy import signals
class TutorialPipeline(object):
filename='QiuShiBaiKe.db'
def __init__(self):
self.conn=None
@classmethod
def from_crawler(cls, crawler):
pipe = cls()
crawler.signals.connect(pipe.initialize,
signal=signals.engine_started)
crawler.signals.connect(pipe.finalize,
signal=signals.engine_stopped)
return pipe
...
此外,如果您在引擎启动时初始化sqlite3.connect/close
,我认为您不需要为每个项目调用self.conn
。