如何添加内存缓存或以其他方式改善延迟?

时间:2013-12-24 13:35:08

标签: python performance google-app-engine memcached

根据webpagetest.org,我的2个页面具有非常不同的延迟,所以似乎在一个页面上memcache加快了速度,或者当我尝试使用memcache时,该页面2的原因是如此之慢甚至关闭图像并关闭javascript和第2页仍然很慢?当我使用memcache时,为什么列表页面的第一个字节的时间如此之慢?

enter image description here enter image description here

缓慢的代码是我应该分析的,但是当它被调试时它表示它快速通过整个函数并从memcache中获取:

2013-12-24 14:12:27.426 /india 200 12919ms 16kb Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 module=default version=2014c
I 2013-12-24 14:12:14.573 got India data from memcache
D 2013-12-24 14:12:14.576 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 14:12:14.577 got data 10381 from memcache
D 2013-12-24 14:12:14.577 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 14:12:14.579 got data 930 from memcache
D 2013-12-24 14:12:14.579 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 14:12:14.580 got data 817 from memcache
D 2013-12-24 14:12:14.580 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 14:12:14.581 got data 455 from memcache
D 2013-12-24 14:12:14.582 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 14:12:14.583 got data 137 from memcache
D 2013-12-24 14:12:14.583 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 14:12:14.584 got data 175 from memcache

memcache助手函数是:

    def get_jobs_count(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'jobscount' + str(city)
        data = memcache.get(key)
        jobs_count = None
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            if city and int(city) > 0:
                logging.debug('fetching number of ads for city %s',
                              str(city))
                jobs_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('6010', '6020', '6030', '6040', '6090') AND cities = KEY('City', :1) AND published = True AND modified > :2 "
                                , city, datetime.now()
                                - timedelta(days=609))
                jobs_count = jobs_count_gql.count(limit=40000)
            elif region and int(region) > 0:
                logging.debug('fetching number of ads for region %s',
                              str(region))
                jobs_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('6010', '6020', '6030', '6040', '6090') AND regions = KEY('Region', :1) AND published = True AND modified > :2"
                                , int(region), datetime.now()
                                - timedelta(days=609))

                # jobs_count_gql = db.GqlQuery("SELECT * FROM Ad WHERE category IN ('6010', '6020', '6030', '6040', '6090') AND regions = KEY('Region', :1) AND published = True AND modified > :2 ",region, datetime.now() - timedelta(days=609))

                jobs_count = jobs_count_gql.count(limit=40000)
            #logging.debug('adding count %d to memcache', jobs_count)
            memcache.add(key, jobs_count, 36000)
            return jobs_count

    def get_electronics_count(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'electronicscount' + str(city)
        data = memcache.get(key)
        electronics_count = 0
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            if city and int(city) > 0:
                electronics_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('5010', '5020', '5030', '5040') AND cities = KEY('City', :1) AND published = True AND modified > :2 "
                                , city, datetime.now()
                                - timedelta(days=609))
                electronics_count = electronics_count_gql.count(limit=40000)

            elif region and int(region) > 0:
                electronics_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('5010', '5020', '5030', '5040') AND regions = KEY('Region', :1) AND published = True AND modified > :2 "
                                , int(region), datetime.now()
                                - timedelta(days=609))
                electronics_count = electronics_count_gql.count(limit=40000)
            memcache.add(key, electronics_count, 36000)
            return electronics_count

    def get_estate_count(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'estatecount' + str(city)
        data = memcache.get(key)
        estate_count = 0
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            if city and int(city) > 0:
                estate_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('1010', '1020', '1030', '1050', '1080', '1090', '1100') AND cities = KEY('City', :1) AND published = True AND modified > :2 "
                                , city, datetime.now()
                                - timedelta(days=609))
                estate_count = estate_count_gql.count(limit=40000)

            elif region and int(region) > 0:
                estate_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('1010', '1020', '1030', '1050', '1080', '1090', '1100') AND regions = KEY('Region', :1) AND published = True AND modified > :2 "
                                , int(region), datetime.now()
                                - timedelta(days=609))
                estate_count = estate_count_gql.count(limit=40000)
            memcache.add(key, estate_count, 36000)
            return estate_count


    def get_home_count(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'homecount' + str(city)
        data = memcache.get(key)
        home_count = None
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            if city and int(city) > 0:
                home_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('3030', '3040', '3050', '3060') AND cities = KEY('City', :1) AND published = True AND modified > :2 "
                                , city, datetime.now()
                                - timedelta(days=609))
                home_count = home_count_gql.count(limit=40000)

            elif region and int(region) > 0:
                home_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('3030', '3040', '3050', '3060') AND regions = KEY('Region', :1) AND published = True AND modified > :2 "
                                , int(region), datetime.now()
                                - timedelta(days=609))
                home_count = home_count_gql.count(limit=40000)
            memcache.add(key, home_count, 36000)
            return home_count

    def get_leisure_count(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'leisurecount' + str(city)
        data = memcache.get(key)
        leisure_count = None
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            if city and int(city) > 0:
                leisure_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('4010', '4020', '4030', '4040', '4060', '4090') AND cities = KEY('City', :1) AND published = True AND modified > :2 "
                                , city, datetime.now()
                                - timedelta(days=609))
                leisure_count = leisure_count_gql.count(limit=40000)

            elif region and int(region) > 0:
                leisure_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('4010', '4020', '4030', '4040', '4060', '4090') AND regions = KEY('Region', :1) AND published = True AND modified > :2 "
                                , int(region), datetime.now()
                                - timedelta(days=609))
                leisure_count = leisure_count_gql.count(limit=40000)
            memcache.add(key, leisure_count, 36000)
            return leisure_count

    def get_vehicles_count(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'vehiclescount' + str(city)
        data = memcache.get(key)
        vehicles_count = None
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            if city and int(city) > 0:
                vehicles_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('2010', '2030', '2040', '2070', '2080') AND cities = KEY('City', :1) AND published = True AND modified > :2 "
                                , city, datetime.now()
                                - timedelta(days=609))
                vehicles_count = vehicles_count_gql.count(limit=40000)

            elif region and int(region) > 0:
                vehicles_count_gql = \
                    db.GqlQuery("SELECT * FROM Ad WHERE category IN ('2010', '2030', '2040', '2070', '2080') AND regions = KEY('Region', :1) AND published = True AND modified > :2 "
                                , int(region), datetime.now()
                                - timedelta(days=609))
                vehicles_count = vehicles_count_gql.count(limit=40000)
            memcache.add(key, vehicles_count, 36000)
            return vehicles_count


    def get_jobs_count_india(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'jobscountindia' + str(city)
        data = memcache.get(key)
        jobs_count_gql = None
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            jobs_count_gql = \
                db.GqlQuery("SELECT * FROM Ad WHERE category IN ('6010', '6020', '6030', '6040', '6090') AND published = True AND modified > :1 "
                            , datetime.now() - timedelta(days=609))
            jobs_count = jobs_count_gql.count(limit=40000)
            memcache.add(key, jobs_count, 36000)
            return jobs_count

    def get_electronics_count_india(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'electronicscountindia' + str(city)
        data = memcache.get(key)
        electronics_count_gql = None
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            electronics_count_gql = \
                db.GqlQuery("SELECT * FROM Ad WHERE category IN ('5010', '5020', '5030', '5040') AND published = True AND modified > :1 "
                            , datetime.now() - timedelta(days=609))
            electronics_count = electronics_count_gql.count(limit=40000)
            memcache.add(key, electronics_count, 36000)
            return electronics_count


    def get_estate_count_india(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'estatecountindia' + str(city)
        data = memcache.get(key)
        estate_count_gql = None
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            estate_count_gql = \
                db.GqlQuery("SELECT * FROM Ad WHERE category IN ('1010', '1020', '1030', '1050', '1080', '1090', '1100') AND published = True AND modified > :1 "
                            , datetime.now() - timedelta(days=609))
            estate_count = estate_count_gql.count(limit=40000)
            memcache.add(key, estate_count, 36000)
            return estate_count

    def get_home_count_india(self, region, city):
        logging.debug('attempting to fetch data from memchcache for region %s city %s'
                       % (str(region), str(city)))
        key = str(region) + 'homecountindia' + str(city)
        data = memcache.get(key)
        home_count_gql = None
        if data is not None:
            logging.debug('got data %s from memcache', str(data))
            return data
        else:
            home_count_gql = \
                db.GqlQuery("SELECT * FROM Ad WHERE category IN ('3030', '3040', '3050', '3060') AND published = True AND modified > :1 "
                            , datetime.now() - timedelta(days=609))
            home_count = home_count_gql.count(limit=40000)
            memcache.add(key, home_count, 36000)
            return home_count

def get_leisure_count_india(self, region, city):
    logging.debug('attempting to fetch data from memchcache for region %s city %s'
                   % (str(region), str(city)))
    key = str(region) + 'leisurecountindia' + str(city)
    data = memcache.get(key)
    leisure_count_gql = None
    if data is not None:
        logging.debug('got data %s from memcache', str(data))
        return data
    else:
        leisure_count_gql = \
            db.GqlQuery("SELECT * FROM Ad WHERE category IN ('4010', '4020', '4030', '4040', '4060', '4090') AND published = True AND modified > :1 "
                        , datetime.now() - timedelta(days=609))
        leisure_count = leisure_count_gql.count(limit=40000)
        memcache.add(key, leisure_count, 36000)
        return leisure_count

def get_vehicles_count_india(self, region, city):
    logging.debug('attempting to fetch data from memchcache for region %s city %s'
                   % (str(region), str(city)))
    key = str(region) + 'vehiclescountindia' + str(city)
    data = memcache.get(key)
    vehicles_count_gql = None
    if data is not None:
        logging.debug('got data %s from memcache', str(data))
        return data
    else:
        vehicles_count_gql = \
            db.GqlQuery("SELECT * FROM Ad WHERE category IN ('2010', '2030', '2040', '2070', '2080') AND published = True AND modified > :1 "
                        , datetime.now() - timedelta(days=609))
        vehicles_count = vehicles_count_gql.count(limit=40000)
        memcache.add(key, vehicles_count, 36000)
        return vehicles_count

def find_documents(query_string, limit, cursor):
    try:
        date_desc = search.SortExpression(expression='date',
                direction=search.SortExpression.DESCENDING,
                default_value=datetime(1999,01,01))

        hr_desc = search.SortExpression(expression='hour',
                direction=search.SortExpression.DESCENDING,
                default_value=1)

        min_desc = search.SortExpression(expression='minute',
                direction=search.SortExpression.DESCENDING,
                default_value=1)

        # Sort up to  matching results by subject in descending order
        sort = search.SortOptions(expressions=[date_desc, hr_desc,
                                  min_desc], limit=ACCURACY)

        # Set query options
        options = search.QueryOptions(limit=50, cursor=cursor,
                sort_options=sort,
                number_found_accuracy=10000,
              #  returned_fields=['title', 'city', 'region','category', 'adID', 'date','price', 'type', 'company_ad', 'adID', 'cityID','regionID', 'hour','minute'],
             #snippeted_fields=['text']
              )
        query = search.Query(query_string=query_string, options=options)
        index = search.Index(name=_INDEX_NAME)
        logging.debug('query_string i find %s' , str(query.query_string))
        logging.debug('query_options i find %s' , str(query.options))
        # Execute the query
        return index.search(query)

    except search.PutError as e:
        logging.exception('caught PutError %s', e)

    except search.InternalError as e:
        logging.exception('caught InternalError %s', e)

    except search.DeleteError as e:
        logging.exception('caught DeleteError %s', e)

    except search.TransientError as e:
        logging.exception('caught TransientError %s', e)

    except search.InvalidRequest as e:
        logging.exception('caught InvalidError %s', e)

    except search.Error as e:
        logging.exception('caught unknown error  %s', e)

    return None

def mutate_query(self, query):

    # query = query.replace(...) # whatever you are doing here

    query = re.sub("regionID=\d+", '', query)
    to_remove = [
        'category and',
        'type=s',
        'type=w',
        'type=r',
        'type=b',
        'cityID and',
        'and',
        'regionID',
        ]
    for s in to_remove:
        query = query.replace(s, '')
    query = query.replace('=', '%3D')
    query = re.sub("cityID%3D\d+", '', query)
    query = re.sub("category%3D\d+", '', query)
    query = query.replace('  ', ' ')
    return query

我在代码中添加了更多日志记录并进行了更新:

class India(SearchBaseHandler):

    def get_data(self, key, query):
        data = memcache.get(key)
        if data is not None:
            logging.info('got India data from memcache')
            return data
        else:
            data = find_documents(query, 50, search.Cursor())
            memcache.add(key, data, 36000)
            return data

    def get_count(self, key):
        data = memcache.get(key)
        if data is not None:
            return data
        else:
            data = Ad.all().filter('published =',
                                   True).filter('modified >',
                    datetime.datetime.now()
                    - timedelta(days=609)).count(limit=40000)
            memcache.add(key, data, 36000)
            return data

    def get(self):
        """Handles a get request with a query."""
        regionname = None
        logging.info('get India data')
        country = 'India'
        cursor = self.request.get('cursor')
        region = None
        if self.request.host.find('hipheap') > -1: country = 'USA'
        elif self.request.host.find('koolbusiness') > -1: country = 'India'
        elif self.request.host.find('montao') > -1: country = 'Brasil'
        uri = urlparse(self.request.uri)
        #query = self.request.GET['query']

        query = ''
        if uri.query:
            query = parse_qs(uri.query)
            try:
                query = query['query'][0]
            except KeyError, err:
                query = ''

        key = 'india-adlist'
        results=None
        logging.info('trying to get cached data')
        if cursor:
            logging.info('round-trip with cursor')
            results = find_documents(query, 50, search.Cursor(cursor))
        else:
            results = self.get_data( key, query )
            logging.info('got cached data')

        next_cursor = None
        if results and results.cursor: next_cursor = results.cursor.web_safe_string
        query = query.replace(' and company_ad=0', ''
                              ).replace(' and company_ad=1', '')
        regionname = 'Entire India'
        regionID = 0
        cityID = 0
        form = SearchForm()
        form.w.choices = region_id_to_name
        n_res = self.get_count('count_koolbusiness')
        logging.debug('setting template values')
        template_values = {
            'number_found':n_res,
            'regions':region_id_to_name,
            'form': form,
            'results': results,
            'cursor': next_cursor,
            'region': region,
            'country': country,
            'number_returned': len(results.results),
            'jobs_count': get_jobs_count_india(self, regionID, cityID),
            'estate_count': get_estate_count_india(self, regionID,
                    cityID),
            'electronics_count': get_electronics_count_india(self,
                    regionID, cityID),
            'home_count': get_home_count_india(self, regionID, cityID),
            'leisure_count': get_leisure_count_india(self, regionID,
                    cityID),
            'vehicles_count': get_vehicles_count_india(self, regionID,
                    cityID),
            'user': users.get_current_user(),
            'loggedin': self.logged_in,
            'region': region,
            'regionname': regionname,
            'city': '',
            'cityentity': None,
            'request': self.request,
            'form': SearchForm(),
            'query': query,
            }
        logging.debug('rendering template')
        self.render_template('q.htm', template_values)

然后日志表明模板渲染很慢,即代码的最后一行self.render_template

2013-12-24 15:26:31.490 /india 200 13508ms 16kb Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 module=default version=2014c
I 2013-12-24 15:26:18.148 get India data
I 2013-12-24 15:26:18.148 trying to get cached data
I 2013-12-24 15:26:18.155 got India data from memcache
I 2013-12-24 15:26:18.156 got cached data
D 2013-12-24 15:26:18.158 setting template values
D 2013-12-24 15:26:18.158 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 15:26:18.160 get_jobs_count_india data 10381 from memcache
D 2013-12-24 15:26:18.160 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 15:26:18.163 get_estate_count_india data 930 from memcache
D 2013-12-24 15:26:18.163 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 15:26:18.164 get_electronics_count_india data 817 from memcache
D 2013-12-24 15:26:18.165 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 15:26:18.167 get_home_count_india data 455 from memcache
D 2013-12-24 15:26:18.167 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 15:26:18.169 get_leisure_count_india data 137 from memcache
D 2013-12-24 15:26:18.169 attempting to fetch data from memchcache for region 0 city 0
D 2013-12-24 15:26:18.171 get_vehicles_count_india data 175 from memcache
D 2013-12-24 15:26:18.172 rendering template

enter image description here

使用appstats查看页面/india上仍然有往返但我找不到。

enter image description here

现在我也从日志中的appstats收到消息:

2013-12-24 15:45:07.315 /india 200 16962ms 16kb Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 module=default version=2014c
D 2013-12-24 15:44:50.487 rendering template
D 2013-12-24 15:45:07.128 done rendering template
I 2013-12-24 15:45:07.169 Full proto too large to save, cleared variables.
I 2013-12-24 15:45:07.181 Saved; key: __appstats__:090400, part: 120 bytes, full: 217081 bytes, overhead: 0.114 + 0.052; link: http://www.koolbusiness.com/_ah/stats/details?tim

1 个答案:

答案 0 :(得分:1)

memcache不是这里的罪恶方。如果查看日志中的时间,所有memcache调用都会在收到请求后的第一秒内完成。这可能是请求的其余部分缓慢的。对get_jobs / estate / electronics / home / leisure / vehicles_count_india的各种调用看起来很可疑,如果你有很多变量,例如当你有一个非常大的搜索结果而你没有分页时,模板渲染也可能会很慢