我一直在与Scrapy和Mongodb斗争很长一段时间。我成功抓取了来自http://antispam.imp.ch/spamlist的数据,但是当我完成了Pipeline时,KeyError: 'AntispamItem does not support field: _id'
就出现了。
我是Python和Mongodb的新手,似乎我的代码没有错误可以从错误日志中得出结论。我已经尝试过来自Google的所有解决方案..我想在插入mongodb时,会自动生成ID ..但似乎并非如此。如果有人能告诉我如何解决这个问题我真的很感激。
以下是Pipelines.py:
import pymongo
from scrapy.conf import settings
class AntispamPipeline(object):
def __init__(self):
connection=pymongo.MongoClient('localhost',27017)
db=connection['threat_ip']
self.collection=db['data_center_test']
def process_item(self, item, spider):
self.collection.insert(item)
return item
以下是蜘蛛和错误日志
import re
import scrapy
from antispam.items import AntispamItem
class Antispam_Spider(scrapy.Spider):
name='antispam'
start_urls=['http://antispam.imp.ch/spamlist']
def parse(self, response):
content=response.body
ip_name=re.findall('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}',content)
content_list=content.split('\t')
content_data=[]
for i in range(0,len(content_list)-1):
if content_list[i] in ip_name:
dic={}
dic['name']=content_list[i]
dic['time']=content_list[i+2]
content_data.append(dic)
else:
pass
for dic in content_data:
item=AntispamItem()
item['name']=dic['name']
item['time']=dic['time']
item['type']='Spam Sources'
yield item
KeyError: 'AntispamItem does not support field: _id'
2017-01-10 15:59:02 [scrapy.core.scraper] ERROR: Error processing {'name': '223.230.65.17',
'time': 'Mon Jan 9 01:07:38 2017',
'type': 'Spam Sources'}
Traceback (most recent call last):
File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 651, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "V:\work\antispam\antispam\pipelines.py", line 16, in process_item
self.collection.insert(item)
File "c:\python27\lib\site-packages\pymongo\collection.py", line 2469, in insert
check_keys, manipulate, write_concern)
File "c:\python27\lib\site-packages\pymongo\collection.py", line 562, in _insert
check_keys, manipulate, write_concern, op_id, bypass_doc_val)
File "c:\python27\lib\site-packages\pymongo\collection.py", line 524, in _insert_one
doc['_id'] = ObjectId()
File "c:\python27\lib\site-packages\scrapy\item.py", line 63, in __setitem__
(self.__class__.__name__, key))
KeyError: 'AntispamItem does not support field: _id'
在MongoDB的官方文档中,我看到"If the document does not specify an _id field, then MongoDB will add the _id field and assign a unique ObjectId for the document before inserting."
所以有人能告诉我发生了什么吗?
答案 0 :(得分:0)
发表评论后发现它很短。至少在我的情况下,这是因为我试图将Item
作为defined by scrapy插入。在文档中,您可以看到setting an unknown field value will produce this error。 MongoDB正在尝试设置/添加_id
字段。
所以改变这个:
.insert_one(my_item)
到此:
.insert_one(dict(my_item))
为我解决了。