Mongoengine - 如何执行“保存新项目或增加计数器”操作?

时间:2013-01-31 10:21:21

标签: python mongodb mongoengine

我在网络抓取项目中使用MongoEngine。我想跟踪我在所有抓取的网页上遇到的所有图像。

为此,我存储了图片src网址以及图片遇到的次数。

MongoEngine模型定义如下:

class ImagesUrl(Document):
    """ Model representing images encountered during web-scraping.

    When an image is encountered on a web-page during scraping,
    we store its url and the number of times it has been
    seen (default counter value is 1).
    If the image had been seen before, we do not insert a new document
    in collection, but merely increment the corresponding counter value.

    """

    # The url of the image. There cannot be any duplicate.
    src = URLField(required=True, unique=True)

    # counter of the total number of occurences of the image during
    # the datamining process
    counter = IntField(min_value=0, required=True, default=1)

我正在寻找实施“保存或增加”过程的正确方法。

到目前为止,我正在以这种方式处理它,但我觉得可能有更好的,内置的方式来使用MongoEngine:

def save_or_increment(self):
    """ If it is the first time the image has been encountered, insert
        its src in mongo, along with a counter=1 value.
        If not, increment its counter value by 1.

    """ 
    # check if item is already stored
    # if not, save a new item
    if not ImagesUrl.objects(src=self.src):
        ImagesUrl(
            src=self.src,
            counter=self.counter,
            ).save()
    else:
        # if item already stored in Mongo, just increment its counter
        ImagesUrl.objects(src=self.src).update_one(inc__counter=1)

有更好的方法吗?

非常感谢你的时间。

2 个答案:

答案 0 :(得分:10)

你应该能够做upsert例如:

 ImagesUrl.objects(src=self.src).update_one(
                                  upsert=True, 
                                  inc__counter=1, 
                                  set__src=self.src)

答案 1 :(得分:0)

@ross答案中的

update_one作为结果(或完整的更新结果)包含已修改文档的数量,并且不会返回该文档或新的计数器编号。如果您想要一个,请使用upsert_one

images_url = ImagesUrl.objects(src=self.src).upsert_one(
                                              inc__counter=1,
                                              set__src=self.src)
print images_url.counter

如果不存在,它将创建文档,或者对其进行修改并增加计数器编号。