如何从两个不同的联接表中查询ID的交集计数

时间:2019-07-01 05:37:54

标签: sql elixir ecto

我有几种型号。用户,曲目,播放列表,UserTrack和PlaylistTrack。用户可以具有许多曲目,播放列表也可以。我想查询与用户的UserTracks匹配的曲目最多的播放列表。

这就是我现在的做法:

user_tracks = # get all track_ids for a user in the users_tracks join table

playlists =
  from(p in Playlist,
    join: pt in PlaylistTrack,
    where: pt.playlist_id == p.id,
    having:
      fragment(
        "cardinality(array(
              select unnest(array_agg(?)::varchar[])
              intersect
              select unnest(?::varchar[])
            )) > 0",
        pt.track_id,
        ^user_tracks
      ),
    group_by: p.id,
    order_by:
      fragment(
        "cardinality(array(
              select unnest(array_agg(?)::varchar[])
              intersect
              select unnest(?::varchar[])
            )) DESC",
        pt.track_id,
        ^user_tracks
      ),
    limit: ^limit,
    select: p
  )

这有效,但是速度很慢,有没有更好的方法编写此查询?

简而言之:“如果我们有表users,并且联接表user_tracks中有许多关联,则链接到tracks。表playlists与表tracksusers的assoc有共同的数量

1 个答案:

答案 0 :(得分:0)

我将首先进行最佳的原始SQL查询,然后查看是否/如何将其转换为Ecto。

提供此数据:

CREATE TABLE playlist_tracks (playlist_id integer, track_id integer);
CREATE TABLE playlists (id integer, title character varying);
CREATE TABLE user_tracks (user_id integer, track_id integer);

INSERT INTO user_tracks VALUES (1, 2), (1, 1), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (1, 10);
INSERT INTO playlist_tracks VALUES (1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (4, 9), (4, 10), (4, 11), (5, 10), (5, 11), (5, 12);
INSERT INTO playlists VALUES (1, 'Classic Rock''s Greatest Hits'),
                             (2, 'Kings & Queens of Blues Rock'),
                             (3, 'Mojito Lounge'),
                             (4, 'Dark Electro Lounging'),
                             (5, 'Atmospheric Chilled Electronica');

我到达的查询如下:

  SELECT p.id,
         p.title
    FROM playlists p
    JOIN playlist_tracks pt on p.id = pt.playlist_id
    JOIN user_tracks ut on pt.track_id = ut.track_id
   WHERE ut.user_id = 1
GROUP BY p.id,
         p.title
ORDER BY count(1) DESC;

请注意,您无需在此处加入users表。

在查询上运行explain analyse ...返回以下内容:

                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=150.61..151.11 rows=200 width=44) (actual time=0.131..0.131 rows=5 loops=1)
   Sort Key: (count(1)) DESC
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=140.97..142.97 rows=200 width=44) (actual time=0.119..0.121 rows=5 loops=1)
         Group Key: p.id, p.title
         ->  Hash Join  (cost=82.25..135.05 rows=789 width=36) (actual time=0.104..0.108 rows=15 loops=1)
               Hash Cond: (p.id = pt.playlist_id)
               ->  Seq Scan on playlists p  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.011..0.012 rows=5 loops=1)
               ->  Hash  (cost=80.70..80.70 rows=124 width=4) (actual time=0.073..0.073 rows=15 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  Hash Join  (cost=38.39..80.70 rows=124 width=4) (actual time=0.042..0.066 rows=15 loops=1)
                           Hash Cond: (pt.track_id = ut.track_id)
                           ->  Seq Scan on playlist_tracks pt  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.005..0.008 rows=18 loops=1)
                           ->  Hash  (cost=38.25..38.25 rows=11 width=4) (actual time=0.020..0.020 rows=10 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                 ->  Seq Scan on user_tracks ut  (cost=0.00..38.25 rows=11 width=4) (actual time=0.007..0.011 rows=10 loops=1)
                                       Filter: (user_id = 1)
 Planning Time: 0.215 ms
 Execution Time: 0.658 ms
(19 rows)

具有以下索引将大大提高查询性能:

create index ut_user_id on user_tracks (user_id);
create index ut_track_id on user_tracks (track_id);
create index pt_track_id on playlist_tracks (track_id);
create index pt_playlist_id on playlist_tracks (playlist_id);
create index p_id on playlists (id);

改进后的计划:

                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3.26..3.27 rows=1 width=44) (actual time=0.103..0.103 rows=5 loops=1)
   Sort Key: (count(1)) DESC
   Sort Method: quicksort  Memory: 25kB
   ->  GroupAggregate  (cost=3.23..3.25 rows=1 width=44) (actual time=0.093..0.098 rows=5 loops=1)
         Group Key: p.id, p.title
         ->  Sort  (cost=3.23..3.24 rows=1 width=36) (actual time=0.087..0.089 rows=15 loops=1)
               Sort Key: p.id, p.title
               Sort Method: quicksort  Memory: 26kB
               ->  Nested Loop  (cost=1.27..3.22 rows=1 width=36) (actual time=0.043..0.075 rows=15 loops=1)
                     ->  Hash Join  (cost=1.14..2.39 rows=1 width=4) (actual time=0.037..0.045 rows=15 loops=1)
                           Hash Cond: (pt.track_id = ut.track_id)
                           ->  Seq Scan on playlist_tracks pt  (cost=0.00..1.18 rows=18 width=8) (actual time=0.007..0.010 rows=18 loops=1)
                           ->  Hash  (cost=1.12..1.12 rows=1 width=4) (actual time=0.016..0.016 rows=10 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                 ->  Seq Scan on user_tracks ut  (cost=0.00..1.12 rows=1 width=4) (actual time=0.006..0.009 rows=10 loops=1)
                                       Filter: (user_id = 1)
                     ->  Index Scan using p_id on playlists p  (cost=0.13..0.82 rows=1 width=36) (actual time=0.001..0.001 rows=1 loops=15)
                           Index Cond: (id = pt.playlist_id)
 Planning Time: 0.505 ms
 Execution Time: 0.163 ms
(20 rows)

它返回以下结果:

 id |              title
----+---------------------------------
  3 | Mojito Lounge
  2 | Kings & Queens of Blues Rock
  1 | Classic Rock's Greatest Hits
  4 | Dark Electro Lounging
  5 | Atmospheric Chilled Electronica
(5 rows)

看起来像您所期望的吗?

如果是,则可以将其翻译为Ecto:

import Ecto.Query

user_id = 1

query =
  from p in "playlists",
    select: {p.id, p.title},
    join: pt in "playlist_tracks", on: pt.playlist_id == p.id,
    join: ut in "user_tracks", on: ut.track_id == pt.track_id,
    where: ut.user_id == ^user_id,
    group_by: [p.id, p.title],
    order_by: count(1)

MyRepo.all(query)

您应该可以通过以下方式采用此示例:

  • 交换表名称引用(例如playlistsuser_tracks)用于架构,例如分别PlaylistUser.Track(假设这是您的架构模块的调用方式)

  • select: {p.id, p.title}更改为select: struct(p, [p.id, p.title]),如果需要的话,将为您提供Playlist结构而不是元组(不过经过测试)。 / p>


通常,除非查询必须具有高度的灵活性(例如,取决于运行时确定的许多参数)(通过使用Ecto动态构建查询来实现),否则最好将查询保持为原始SQL。