有没有一种方法可以使用索引来加速此多表查询?

时间:2019-05-16 21:40:29

标签: sql postgresql activerecord indexing postgresql-performance

我正在尝试获取属于用户所有对话的所有标签(用户通过ConversationUserPair联接有很多对话)-但查询平均需要2,000毫秒。

SELECT "tags"."tag_text_downcased"
FROM "tags"
INNER JOIN "conversations" ON "tags"."conversation_id" = "conversations"."id"
INNER JOIN "conversation_user_pairs" ON "conversations"."id" = "conversation_user_pairs"."conversation_id"
WHERE "conversation_user_pairs"."user_id" = ?
AND "conversation_user_pairs"."conversation_status" = ?
AND ("tags"."user_id" = ?);

当我在psql控制台中运行EXPLAIN ANALYZE时,这是我得到的响应:

EXPLAIN ANALYZE
SELECT "tags"."tag_text_downcased" FROM "tags" INNER JOIN "conversations" ON "tags"."conversation_id" = "conversations"."id" INNER JOIN "conversation_user_pairs" ON "conversations"."id" = "conversation_user_pairs"."conversation_id" WHERE "conversation_user_pairs"."user_id" = '459' AND "conversation_user_pairs"."conversation_status" = 'active' AND ("tags"."user_id" = '459');

Nested Loop  (cost=462.87..486.65 rows=1 width=11) (actual time=0.457..1.886 rows=40 loops=1)
   Join Filter: (tags.conversation_id = conversations.id)
   ->  Merge Join  (cost=462.78..482.97 rows=1 width=19) (actual time=0.401..1.334 rows=40 loops=1)
         Merge Cond: (tags.conversation_id = conversation_user_pairs.conversation_id)
         ->  Sort  (cost=462.70..462.83 rows=259 width=15) (actual time=0.332..0.337 rows=40 loops=1)
               Sort Key: tags.conversation_id
               Sort Method: quicksort  Memory: 27kB
               ->  Bitmap Heap Scan on tags  (cost=4.49..460.62 rows=259 width=15) (actual time=0.152..0.295 rows=40 loops=1)
                     Recheck Cond: (user_id = 459)
                     Heap Blocks: exact=23
                     ->  Bitmap Index Scan on index_tags_on_user_id_and_conversation_id  (cost=0.00..4.47 rows=259 width=0) (actual time=0.105..0.105 rows=40 loops=1)
                           Index Cond: (user_id = 459)
         ->  Index Only Scan using by_user_and_conversation_and_status on conversation_user_pairs  (cost=0.08..20.02 rows=522 width=4) (actual time=0.066..0.956 rows=390 loops=1)
               Index Cond: ((user_id = 459) AND (conversation_status = 'active'::text))
               Heap Fetches: 134
   ->  Index Only Scan using index_conversations_on_id on conversations  (cost=0.08..3.68 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=40)
         Index Cond: (id = conversation_user_pairs.conversation_id)
         Heap Fetches: 40

我认为我在所讨论的三个单独的表上都有适当的索引。我有:

add_index "tags", ["conversation_id", "user_id", "tag_text_downcased"], name: "find_tag_text_downcased_tags"
add_index "tags", ["conversation_id", "user_id"], name: "index_conversation_first_tags"
add_index "tags", ["user_id", "conversation_id"], name: "index_tags_on_user_id_and_conversation_id"

add_index "conversation_user_pairs", ["user_id", "conversation_id", "conversation_status"], name: "by_user_and_conversation_and_status"

add_index "conversations", ["id"], name: "index_conversations_on_id"

这里似乎没有使用任何表中的索引来加快查询速度吗?还是有办法拥有多表索引?

1 个答案:

答案 0 :(得分:0)

我要在缺乏信息的情况下进行有根据的猜测...

查询

您显示的查询不适合您声明的目标:

  

我正在尝试获取属于用户所有对话的所有标签

我假设“全部”是指“任何”。

还假设引用完整性是通过外键约束来强制实施的。然后我们可以切掉中间人conversations。加入它只会增加成本。

通过这种方式,查询可以多次返回相同的标签。假设您需要唯一的标记,则足以断言conversation_user_pairs 中存在的所有匹配行EXISTS半联接通常是实现此目的的最佳方法:

SELECT t.tag_text_downcased
FROM   tags t
WHERE  t.user_id = 459  -- assuming it's a numeric data type
AND    EXISTS (
   SELECT
   FROM   conversation_user_pairs cu
   WHERE  cu.user_id         = t.user_id
   AND    cu.conversation_id = t.conversation_id
   AND    cu.conversation_status = 'active'
   );

索引

您在find_tag_text_downcased_tags上的索引tags很完美。
by_user_and_conversation_and_status也很适合。如果许多行不是“活动的”,则尽管您对活动的行最感兴趣,但局部索引甚至可以更好:

CREATE INDEX ON conversation_user_pairs (user_id, conversation_id)
WHERE conversation_status = 'active';

您在这里不需要其他索引。既然您有两个:

add_index "tags", ["conversation_id", "user_id", "tag_text_downcased"], name: "find_tag_text_downcased_tags"
add_index "tags", ["user_id", "conversation_id"], name: "index_tags_on_user_id_and_conversation_id"

...保留该标记通常也没有用:

add_index "tags", ["conversation_id", "user_id"], name: "index_conversation_first_tags"

您可以删除它。参见:

在旁边:如果conversation_status仅具有'active'和'dead'或类似内容,则将其设为boolean列。比text更小,更便宜。