scrapy将项目保存到mongodb:覆盖重复数据

时间:2014-08-11 02:38:14

标签: python mongodb django-models scrapy

如果mongo db中已存在titleurl,我想过滤 如果是,则覆盖到mongo db 请指导我如何在scrapy和mongo之间过滤titleURL?

items.py:

from scrapy.contrib.djangoitem import DjangoItem
from mongo_test.models import Ct
class CtItem(DjangoItem):
    django_model = Ct  

mongo_test / models.py:

class Ct(models.Model):   
    title       = models.CharField(max_length=100)                 
    titleURL    = models.URLField(max_length=255)   
    .....          

pipeline.py:

from mongo_test.models import Ct
class CtPipeline(object):
    def process_item(self, item, spider):
        ct = item.save(commit=False)  
        ct_exist = Ct.objects.filter()  #how to let scrapy titleURL= mongo titleURL
    if ct_exist:
       # override to mongo
    ct.save()
    return item
django项目中的

settings.py:

DATABASES = {
    'default': {
        'ENGINE': 'django_mongodb_engine',
        'NAME': 'scrapy',
    } 
}

0 个答案:

没有答案