如何有效地连接具有相同ID的多个表?

时间:2015-02-06 18:05:27

标签: mysql sql ruby-on-rails postgresql join

我正在尝试构建一个基本的报表结构,并且我在table1中有一个id,它基本上是一个user_id,然后在用户做某事或与某事物相关联时,该ID在其他表中捕获这些动作

我想得到一天的所有table1记录,这些其他表中的记录如何与这些记录相关联。 (对于这个用户,他有10个小部件,15个sign_ins,20个generic_actions)

这是我的查询,它给出了正确的结果,但是SUPER效率低,返回大约600万行(当不明显时),应该只有几千行。

结果我基本上说如果我为一个用户执行此操作并得到上面的结果我得到1 * 10 * 15 * 20行返回,当我真的想要1行时,每行有多少。是的我知道我可以数不清楚,但它仍然没有加入正确的搜索如此多的行,以至于令人望而却步。是否存在连接类型或我缺少的东西,以便在没有所有额外行的情况下有效地加入?

SELECT 
 DISTINCT DATE_TRUNC('day',table1.created_at) as c_date,
 count(distinct table1.id) as t1_tot,
 count(distinct table2.id) as t2_tot,
 count(distinct table3.id) as t3_tot,
 count(distinct table4.id) as t4_tot,
 count(distinct table5.id) as t5_tot,
FROM 
 table1 
LEFT JOIN 
 table2 ON table1.id = table2.t1_id 
LEFT JOIN 
table3 ON table1.id = table3.t1_id 
LEFT JOIN 
 table4 ON table1.id = table4.t1_id 
LEFT JOIN 
 table5 ON table1.id = table5.t1_id  
WHERE 
(table1.created_at >= '02-02-2015' AND table1.created_at <= '02-05-2015') 
GROUP BY c_date  
ORDER BY c_date desc

实际上是否有办法通过加入获得我想要的东西?查询是如此昂贵,它超时。

我正在使用postgres和rails,所有这些都与模型和关联有关。

更新测试安德鲁斯评论得到了查询计划

Unique  (cost=1062356.31..1062384.74 rows=2843 width=16)
   ->  Sort  (cost=1062356.31..1062363.42 rows=2843 width=16)
     Sort Key: (date_trunc('day'::text, table1.created_at)), (count(table1.id)), (count(table1_1.id))
     ->  HashAggregate  (cost=1062157.68..1062193.22 rows=2843 width=16)
           ->  Merge Right Join  (cost=0.58..1062136.35 rows=2845 width=16)
                 Merge Cond: (table1_1.id = table1.id)
                 ->  GroupAggregate  (cost=0.29..1059054.94 rows=41399 width=4)
                       ->  Nested Loop  (cost=0.29..756842.24 rows=60359742 width=4)
                             ->  Index Only Scan using table1_pkey on table1 table1_1  (cost=0.29..2314.24 rows=41399 width=4)
                             ->  Materialize  (cost=0.00..34.87 rows=1458 width=0)
                                   ->  Seq Scan on table2  (cost=0.00..27.58 rows=1458 width=0)
                 ->  Index Scan using table1_pkey on table1  (cost=0.29..2521.24 rows=2845 width=12)
                       Filter: ((created_at >= '2015-02-02 00:00:00'::timestamp without time zone) AND (created_at <= '2015-02-05 00:00:00'::timestamp without time zone))

2 个答案:

答案 0 :(得分:0)

  

结果我基本上说如果我为一个用户执行此操作并得到上面的结果我得到1 * 10 * 15 * 20行返回, 当我真的想要一行 ,每个中有多少。

如果您只是在查找每个表的ID数,则无需加入它们。此查询可能适合您:

SELECT 
 DISTINCT DATE_TRUNC('day',table1.created_at) as c_date,
 SUM(CASE WHEN TableId = 'Table1' THEN IdCount ELSE 0 END) AS t1_tot,
 SUM(CASE WHEN TableId = 'Table2' THEN IdCount ELSE 0 END) AS t2_tot,
 SUM(CASE WHEN TableId = 'Table3' THEN IdCount ELSE 0 END) AS t3_tot,
 SUM(CASE WHEN TableId = 'Table4' THEN IdCount ELSE 0 END) AS t4_tot,
 SUM(CASE WHEN TableId = 'Table5' THEN IdCount ELSE 0 END) AS t5_tot
FROM 
(
SELECT COUNT(id) AS 'IdCount', 'Table1' AS TableId FROM table1 UNION ALL
SELECT COUNT(id) AS 'IdCount', 'Table2' AS TableId FROM table2 UNION ALL
SELECT COUNT(id) AS 'IdCount', 'Table3' AS TableId FROM table3 UNION ALL
SELECT COUNT(id) AS 'IdCount', 'Table4' AS TableId FROM table4 UNION ALL
SELECT COUNT(id) AS 'IdCount', 'Table5' AS TableId FROM table5) AS innerTable
)
WHERE 
(table1.created_at >= '02-02-2015' AND table1.created_at <= '02-05-2015') 
GROUP BY c_date  
ORDER BY c_date desc

我还没有对它进行测试,但它应该让你知道该做什么,而且由于你没有加入所有东西,它应该更快地运行。 innerTable基本上将每个表中的id计数作为每个表的单独行返回,其中外部选择将其转换为1行,这就是您想要的。

获取行数很容易;大部分工作都在transposing

答案 1 :(得分:0)

您应该将查询分解为多个子查询。也就是说,而不是

SELECT SUM1, SUM2, SUM3, SUM4 FROM (A Join B Join C Join D)

应该看起来像

SELECT SQ1.SUM1+SQ2.SUM1+SQ3.SUM1, SQ1.SUM2, SQ2.SUM3, SQ3.SUM4 FROM
(SELECT SUM1, SUM2 FROM A JOIN B) SQ1 CROSS JOIN
(SELECT SUM1, SUM3 FROM A JOIN C) SQ2 CROSS JOIN
(SELECT SUM1, SUM4 FROM A JOIN D) SQ3

请注意,它会很长很丑,但速度非常快,因为所有子查询只返回一行。