我有一个存储JSONB字段(data
)的表,其中包含类似Facebook的数据。数据结构是:
-
id | 9403
kind | 'likes'
data | [{ id: "1", name: "Pluto", category: "Planet"}, { id: "2", name: "Saturn", category: "Planet" }]
-
id | 9403
kind | 'likes'
data | [{ id: "2", name: "Neptune", category: "Planet"}, { id: "3", name: "Mars", category: "Planet" }]
目标是编写查询,按类别聚合每个类别的前N(5)个喜欢。我有以下子查询,我不确定如何优化(使用索引或重写)。目标是对名称和类别进行分组,以便对其进行排名。我从有效选择最受欢迎的N的简单问题开始:
SELECT
likes.entry->>'name' AS name,
likes.entry->>'category' AS category,
COUNT(*) AS count
FROM (SELECT json_array_elements(metadata.data::JSON) AS entry FROM metadata WHERE metadata.kind = 'likes') AS likes
GROUP BY name, category
ORDER BY count DESC
LIMIT 5
该查询已经需要5秒钟才能运行(粘贴说明/分析):
Limit (cost=39971.07..39971.07 rows=5 width=32) (actual time=5468.952..5468.954 rows=5 loops=1)
-> Sort (cost=39971.07..39971.17 rows=200 width=32) (actual time=5468.952..5468.954 rows=5 loops=1)
Sort Key: (count(*))
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=39969.61..39970.41 rows=200 width=32) (actual time=5241.143..5376.502 rows=392515 loops=1)
Group Key: (likes.entry ->> 'name'::text), (likes.entry ->> 'category'::text)
-> Subquery Scan on likes (cost=0.00..34491.46 rows=3652100 width=32) (actual time=0.104..4552.531 rows=880073 loops=1)
-> Seq Scan on metadata (cost=0.00..19883.06 rows=3652100 width=703) (actual time=0.097..2146.678 rows=880073 loops=1)
Filter: ((kind)::text = 'likes'::text)
Rows Removed by Filter: 90145
我可以以某种方式重构这个更快/添加一些索引而不使用物化视图吗?我尝试添加以下(无用)索引:
CREATE INDEX index_metadata_on_likes_raw ON metadata USING gin(data) WHERE (kind = 'likes');
CREATE INDEX index_metadata_on_likes_targeted ON metadata ((data ->> 'name'), (data ->> 'category')) WHERE (kind = 'likes');
答案 0 :(得分:0)
祝你尝试:
select name, category, COUNT(*) AS count from(
SELECT jsonb_array_elements(test.data::JSONB)->>'name' as name, jsonb_array_elements(test.data::JSONB)->>'category' as category FROM test WHERE test.kind = 'likes') a
GROUP BY name, category
ORDER BY count DESC
LIMIT 5;