我想加快我对相当大的聚合的查询性能。总体结果是获得一些特定类别的排名。该表仅由购买数据组成,每行1次购买,例如:
transactions:
| id | category | netvalue | store_id | suburb | city | country |
| 1 | clothes | 20 | 12 | A | AUCK | NZ |
| 2 | food | 10 | 11 | B | WELL | NZ |
| 3 | gear | 120 | 15 | A | CHCH | NZ |
| 4 | clothes | 15 | 9 | C | SYDN | AU |
我想获得特定商店的所有类别的列表,然后将其排名与其自己的郊区, city 和国家。例如,商店12对应于:
我正在尝试生成一个看起来像的结果集:
| category | suburb_rank | city_rank | country_rank |
| clothes | 23 | 20 | 250 |
| food | 27 | 10 | 109 |
...
我想我会从一组聚合开始,按照我感兴趣的每个排名字段进行分组。这会产生如下查询:
WITH aggregations AS (
SELECT
category,
SUM(netvalue) AS sum_netvalue,
store_id,
suburb,
city,
country
FROM
transactions
GROUP BY
store_id,
suburb,
city,
country,
category
)
SELECT * FROM aggregations
然后我使用此聚合表为每个排名列创建排名:
WITH aggregations AS (...),
WITH suburb_rankings AS (
row_number() OVER (
PARTITION BY aggregations.category,
ORDER BY sum(sum_netvalue) DESC
) AS rank,
category,
store_id
FROM
aggregations
WHERE
suburb = @MY_SUBURB
GROUP BY
category,
store_id
),
WITH city_rankings AS (...),
WITH country_rankings AS (...)
最后,我将这些排名表中的每一个加入到类别列表中以获得类别,然后排名(郊区,城市,国家):
...
SELECT
category,
suburb_rankings.rank AS suburb_rank,
city_rankings.rank AS city_rank,
country_rankings.rank AS country_rank
FROM
(SELECT DISTINCT category FROM transactions)
LEFT JOIN
suburb_rankings
ON
suburb_rankings.category = category AND suburb_rankings.store_id = @MY_STORE_ID
LEFT JOIN
city_rankings
ON
city_rankings.category = category AND city_rankings.store_id = @MY_STORE_ID
LEFT JOIN
country_rankings
ON
country_rankings.category = category AND country_rankings.store_id = @MY_STORE_ID
我不确定这是否是最佳方法,性能明智 - 我注意到EXPLAIN ANALYZE描述了花费大量时间进行聚合。
我想知道的是,对这种类型的查询有更好的方法吗?我无法创建预先计算的聚合表,因为原始表中可能有更多列我们可能希望稍后过滤。
我正在使用Postgres 9.2和SQLAlchemy,并简要介绍了dogpile缓存,但不太确定这是否是一个很好的解决方案。