Question

我一直致力于创建一个自动缓存的db.Model子类，即：

instance.put会将实体存储在memcache中，然后再将其保存到数据存储区
class.get_by_key_name将首先检查缓存，如果错过，将转到数据存储区检索它并在检索后缓存它

我开发了以下方法（这似乎对我有用），但我有几个问题：

我读过Nick Johnson关于efficient model memcaching的文章，建议通过协议缓冲区实现memcache的序列化。查看SDK中的memcache API源代码，看起来Google默认已经实现了protobuf序列化。我的解释是否正确？
我是否在遗漏db.Model或重写这两种方法的方式中遗漏了一些重要的细节（这可能会让我在未来）？
有没有更有效的方法来实现我在下面所做的事情？
从绩效角度来看，这些实体缓存何时有意义，是否有指导方针，基准或最佳实践？或者总是是否有意义缓存实体？在相关的说明中，我是否应该阅读Google未在建模API中提供缓存模型的事实？是否有太多特殊情况要考虑？

以下是我目前的实施情况。我非常感谢有关缓存实体的任何和所有指导/建议（即使您的回复不是上述4个问题之一的直接答案，但与整个主题相关）。

from google.appengine.ext import db
from google.appengine.api import memcache

import os
import logging

class CachedModel(db.Model):
    '''Subclass of db.Model that automatically caches entities for put and 
    attempts to load from cache for get_by_key_name
    '''

    @classmethod
    def get_by_key_name(cls, key_names, parent=None, **kwargs):
        cache = memcache.Client()
        # Ensure that every new deployment of the application results in a cache miss
        # by including the application version ID in the namespace of the cache entry
        namespace = os.environ['CURRENT_VERSION_ID'] + '_' + cls.__name__

        if not isinstance(key_names, list):
            key_names = [key_names]
        entities = cache.get_multi(key_names, namespace=namespace)
        if entities:
            logging.info('%s (namespace=%s) retrieved from memcache' % (str(entities.keys()), namespace))

        missing_key_names = list(set(key_names) - set(entities.keys()))
        # For keys missed in memcahce, attempt to retrieve entities from datastore
        if missing_key_names:
            missing_entities = super(CachedModel, cls).get_by_key_name(missing_key_names, parent, **kwargs)
            missing_mapping = zip(missing_key_names, missing_entities)
            # Determine entities that exist in datastore and store them to memcache 
            entities_to_cache = dict()
            for key_name, entity in missing_mapping:
                if entity:
                    entities_to_cache[key_name] = entity
            if entities_to_cache:
                logging.info('%s (namespace=%s) cached by get_by_key_name' % (str(entities_to_cache.keys()), namespace))
                cache.set_multi(entities_to_cache, namespace=namespace)
            non_existent = set(missing_key_names) - set(entities_to_cache.keys())
            if non_existent:
                logging.info('%s (namespace=%s) missing from cache and datastore' % (str(non_existent), namespace))
            # Combine entities retrieved from cache and entities retrieved from datastore
            entities.update(missing_mapping)

        if len(key_names) == 1:
            return entities[key_names[0]]
        else:
            return [entities[key_name] for key_name in key_names]

    def put(self, **kwargs):
        cache = memcache.Client()
        namespace = os.environ['CURRENT_VERSION_ID'] + '_' + self.__class__.__name__
        cache.set(self.key().name(), self, namespace=namespace)
        logging.info('%s (namespace=%s) cached by put' % (self.key().name(), namespace))
        return super(CachedModel, self).put(**kwargs)

Answer 1

为什么不切换到已经实现模型实例内存缓存的NDB，而不是重新发明轮子？

Answer 2

您可以查看Nick Johnson关于添加pre and post hooks for data model classes的文章，作为覆盖get_by_key_name的替代方法。这样，即使使用db.get和db.put也可以使用钩子。

也就是说，我在我的应用程序中发现，我在更高级别上有更多戏剧性的性能改进缓存 - 就像我需要渲染整个页面的所有内容，或者如果可能的话，页面的html本身。 / p>

您还可以查看asynctools库，它可以帮助您并行运行数据存储查询和缓存结果。

Answer 3

我想要实施的Nick Johnson的很多好的提示已经在模块appengine-mp中实现了。比如通过protocolbuf或预取实体进行序列化。

关于您的方法get_by_key_names，您可以check the code。如果您想创建自己的db.Model层，也许这可以帮助您，但您也可以为改进现有模型做出贡献。 ;）

App Engine中自动缓存的模型

3 个答案: