如何缓存分页的Django查询集

时间:2014-01-13 04:09:57

标签: python django django-models memcached django-views

如何缓存分页的Django查询集,特别是在ListView中?

我注意到一个查询花了很长时间才运行,所以我试图缓存它。查询集很大(超过100k记录),所以我试图只缓存它的分页子部分。我无法缓存整个视图或模板,因为有些部分是用户/会话特定的,需要不断更改。

ListView有两种用于检索查询集的标准方法,get_queryset(),用于返回非分页数据; paginate_queryset(),用于过滤当前页面。

我首先尝试在get_queryset()中缓存查询,但很快意识到调用cache.set(my_query_key, super(MyView, self).get_queryset())导致整个查询被序列化。

然后我尝试覆盖paginate_queryset(),如:

import time
from functools import partial
from django.core.cache import cache
from django.views.generic import ListView

class MyView(ListView):

    ...

    def paginate_queryset(self, queryset, page_size):
        cache_key = 'myview-queryset-%s-%s' % (self.page, page_size)
        print 'paginate_queryset.cache_key:',cache_key
        t0 = time.time()
        ret = cache.get(cache_key)
        if ret is None:
            print 're-caching'
            ret = super(MyView, self).paginate_queryset(queryset, page_size)
            cache.set(cache_key, ret, 60*60)
        td = time.time() - t0
        print 'paginate_queryset.time.seconds:',td
        (paginator, page, object_list, other_pages) = ret
        print 'total objects:',len(object_list)
        return ret

但是,这需要几乎一分钟的时间才能运行,即使只检索了10个对象,并且每个请求都显示“重新缓存”,这意味着没有任何内容保存到缓存中。

我的settings.CACHE看起来像是:

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11211',
    }
}

service memcached status显示memcached正在运行,tail -f /var/log/memcached.log完全没有显示任何内容。

我做错了什么?缓存分页查询的正确方法是什么,以便不检索整个查询集?

编辑:我认为他们可能是memcached或Python包装器中的错误。 Django似乎支持两个不同的memcached后端,一个使用python-memcached,另一个使用pylibmc。 python-memcached似乎默默地隐藏了缓存paginate_queryset()值的错误。当我切换到pylibmc后端时,现在我收到一条显式错误消息“来自memcached_set的错误10:SERVER ERROR”追溯到第78行的django / core / cache / backends / memcached.py。

3 个答案:

答案 0 :(得分:1)

您可以通过提供的Paginator扩展cache_key以支持缓存。

可以找到关于此类CachedPaginator的使用和实施的博文here。源代码发布在djangosnippets.org(此处为web-acrhive link,因为原始版本无效)。

但是我会在原始版本中发布一个稍微修改过的示例,它不仅可以缓存每页的对象,还可以缓存总计数。 (有时甚至计数可能是一项昂贵的操作)。

from django.core.cache import cache
from django.utils.functional import cached_property
from django.core.paginator import Paginator, Page, PageNotAnInteger


class CachedPaginator(Paginator):
    """A paginator that caches the results on a page by page basis."""
    def __init__(self, object_list, per_page, orphans=0, allow_empty_first_page=True, cache_key=None, cache_timeout=300):
        super(CachedPaginator, self).__init__(object_list, per_page, orphans, allow_empty_first_page)
        self.cache_key = cache_key
        self.cache_timeout = cache_timeout

    @cached_property
    def count(self):
        """
            The original django.core.paginator.count attribute in Django1.8
            is not writable and cant be setted manually, but we would like
            to override it when loading data from cache. (instead of recalculating it).
            So we make it writable via @cached_property.
        """
        return super(CachedPaginator, self).count

    def set_count(self, count):
        """
            Override the paginator.count value (to prevent recalculation)
            and clear num_pages and page_range which values depend on it.
        """
        self.count = count
        # if somehow we have stored .num_pages or .page_range (which are cached properties)
        # this can lead to wrong page calculations (because they depend on paginator.count value)
        # so we clear their values to force recalculations on next calls
        try:
            del self.num_pages
        except AttributeError:
            pass
        try:
            del self.page_range
        except AttributeError:
            pass

    @cached_property
    def num_pages(self):
        """This is not writable in Django1.8. We want to make it writable"""
        return super(CachedPaginator, self).num_pages

    @cached_property
    def page_range(self):
        """This is not writable in Django1.8. We want to make it writable"""
        return super(CachedPaginator, self).page_range

    def page(self, number):
        """
        Returns a Page object for the given 1-based page number.

        This will attempt to pull the results out of the cache first, based on
        the requested page number. If not found in the cache,
        it will pull a fresh list and then cache that result + the total result count.
        """
        if self.cache_key is None:
            return super(CachedPaginator, self).page(number)

        # In order to prevent counting the queryset
        # we only validate that the provided number is integer
        # The rest of the validation will happen when we fetch fresh data.
        # so if the number is invalid, no cache will be setted
        # number = self.validate_number(number)
        try:
            number = int(number)
        except (TypeError, ValueError):
            raise PageNotAnInteger('That page number is not an integer')

        page_cache_key = "%s:%s:%s" % (self.cache_key, self.per_page, number)
        page_data = cache.get(page_cache_key)

        if page_data is None:
            page = super(CachedPaginator, self).page(number)
            #cache not only the objects, but the total count too.
            page_data = (page.object_list, self.count)
            cache.set(page_cache_key, page_data, self.cache_timeout)
        else:
            cached_object_list, cached_total_count = page_data
            self.set_count(cached_total_count)
            page = Page(cached_object_list, number, self)

        return page

答案 1 :(得分:0)

问题结果是多种因素共同作用。主要是,paginate_queryset()返回的结果包含对无限查询集的引用,这意味着它基本上是不可访问的。当我调用cache.set(mykey, (paginator, page, object_list, other_pages))时,它试图序列化数千条记录,而不仅仅是我期望的page_size条记录,导致缓存的项目超出了memcached的限制而失败。

另一个因素是memcached / python-memcached中可怕的默认错误报告,它默默地隐藏所有错误,并在出现任何问题时将cache.set()转换为nop,这使得追踪它非常耗时。问题

我通过基本上重写paginate_queryset()来解决这个问题,完全放弃了Django的内置分页器功能,并自己计算查询集:

object_list = queryset[page_size*(page-1):page_size*(page-1)+page_size]

然后缓存 object_list

答案 2 :(得分:0)

我想在主页上分页显示无限滚动视图,这就是我想出的解决方案。它是Django CCBV和作者最初的解决方案的组合。

但是,响应时间并没有达到我期望的速度,但这可能是因为我正在本地进行测试,只有6个帖子和2个用户哈哈。

var myApp = angular.module('myApp', ['ngTable', 'ngSanitize','angularjs-dropdown-multiselect']);