使用Scrapy.Pipeline将已删除的项目保存到Mongodb时,出现错误

时间:2017-01-10 08:02:09

标签: python mongodb scrapy pymongo

我一直在与Scrapy和Mongodb斗争很长一段时间。我成功抓取了来自http://antispam.imp.ch/spamlist的数据,但是当我完成了Pipeline时,KeyError: 'AntispamItem does not support field: _id'就出现了。   我是Python和Mongodb的新手,似乎我的代码没有错误可以从错误日志中得出结论。我已经尝试过来自Google的所有解决方案..我想在插入mongodb时,会自动生成ID ..但似乎并非如此。如果有人能告诉我如何解决这个问题我真的很感激。   以下是Pipelines.py:

import pymongo
from scrapy.conf import settings
class AntispamPipeline(object):
    def __init__(self):
        connection=pymongo.MongoClient('localhost',27017)
        db=connection['threat_ip']
        self.collection=db['data_center_test']

    def process_item(self, item, spider):
        self.collection.insert(item)
        return item

以下是蜘蛛和错误日志

import re
import scrapy
from antispam.items import AntispamItem
class Antispam_Spider(scrapy.Spider):
    name='antispam'
    start_urls=['http://antispam.imp.ch/spamlist']
    def parse(self, response):
        content=response.body
        ip_name=re.findall('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}',content)
        content_list=content.split('\t')
        content_data=[]
        for i in range(0,len(content_list)-1):
            if content_list[i] in ip_name:
                dic={}
                dic['name']=content_list[i]
                dic['time']=content_list[i+2]
                content_data.append(dic)
            else:
                pass

        for dic in content_data:
            item=AntispamItem()
            item['name']=dic['name']
            item['time']=dic['time']
            item['type']='Spam Sources'
            yield item




KeyError: 'AntispamItem does not support field: _id'
2017-01-10 15:59:02 [scrapy.core.scraper] ERROR: Error processing {'name': '223.230.65.17',
 'time': 'Mon Jan  9 01:07:38 2017',
 'type': 'Spam Sources'}
Traceback (most recent call last):
  File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 651, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "V:\work\antispam\antispam\pipelines.py", line 16, in process_item
    self.collection.insert(item)
  File "c:\python27\lib\site-packages\pymongo\collection.py", line 2469, in insert
    check_keys, manipulate, write_concern)
  File "c:\python27\lib\site-packages\pymongo\collection.py", line 562, in _insert
    check_keys, manipulate, write_concern, op_id, bypass_doc_val)
  File "c:\python27\lib\site-packages\pymongo\collection.py", line 524, in _insert_one
    doc['_id'] = ObjectId()
  File "c:\python27\lib\site-packages\scrapy\item.py", line 63, in __setitem__
    (self.__class__.__name__, key))
KeyError: 'AntispamItem does not support field: _id'

在MongoDB的官方文档中,我看到"If the document does not specify an _id field, then MongoDB will add the _id field and assign a unique ObjectId for the document before inserting."所以有人能告诉我发生了什么吗?

1 个答案:

答案 0 :(得分:0)

发表评论后发现它很短。至少在我的情况下,这是因为我试图将Item作为defined by scrapy插入。在文档中,您可以看到setting an unknown field value will produce this error。 MongoDB正在尝试设置/添加_id字段。

所以改变这个:

.insert_one(my_item)

到此:

.insert_one(dict(my_item))

为我解决了。