PostgreSQL汇总并集,交集和集差异

时间:2018-11-19 21:19:50

标签: sql postgresql

我有一个汇总表,如下所示:

+---------+----------+
| left_id | right_id |
+---------+----------+
| a       | b        |
+---------+----------+
| a       | c        |
+---------+----------+

还有一个值表:

+----+-------+
| id | value |
+----+-------+
| a  | 1     |
+----+-------+
| a  | 2     |
+----+-------+
| a  | 3     |
+----+-------+
| b  | 1     |
+----+-------+
| b  | 4     |
+----+-------+
| b  | 5     |
+----+-------+
| c  | 1     |
+----+-------+
| c  | 2     |
+----+-------+
| c  | 3     |
+----+-------+
| c  | 4     |
+----+-------+

对于每对,我想计算并比较,相交和设置差异(每种方式)的长度,以比较值,以便输出看起来像这样:

+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a       | b        | 5     | 1            | 2         | 2          |
+---------+----------+-------+--------------+-----------+------------+
| a       | c        | 4     | 3            | 0         | 1          |
+---------+----------+-------+--------------+-----------+------------+

使用PostgreSQL解决此问题的最佳方法是什么?

更新:这是带有数据https://rextester.com/RWID9864的右下角链接

2 个答案:

答案 0 :(得分:1)

您需要执行此操作的标量子查询。

UNION也可以用retrieve表示,这使得该查询的编写时间较短。但是对于交叉点,您需要更长的查询时间。

要计算“差异”,请使用def subString(string) sentence = string print"=========================\n" print sentence print "\n" print "Enter the word you want to replace: " replaceWord = gets print "Enter what you want the new word to be: " newWord = gets sentence[replaceWord] = [newWord] print sentence #newString = sentence.gsub(replaceWord, newWord) #newString = sentence.gsub("World", "Ruby") #print newString end 运算符:

OR

答案 1 :(得分:1)

我不知道是什么原因导致您运行缓慢,因为我看不到表格大小和/或无法解释计划。假设两个表都足够大,以至于嵌套循环效率低下,并且不敢考虑将值连接到自身,那么我将尝试从这样的标量子查询中重写它:

select p.*,
       coalesce(stats."union", 0) "union",
       coalesce(stats.intersection, 0) intersection,
       coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
       coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
       select left_id,
              right_id,
              count(*) "union",
              count(has_left and has_right) intersection,
              count(has_left) left_cnt,
              count(has_right) right_cnt
       from (
              select p.*,
                     v."value" the_value,
                     true has_left
              from pairs p
              join "values" v on v.id = p.left_id
       ) l
       full join (
              select p.*,
                     v."value" the_value,
                     true has_right
              from pairs p
              join "values" v on v.id = p.right_id
       ) r using(left_id, right_id, the_value)
       group by left_id,
                right_id
) stats on p.left_id = stats.left_id
       and p.right_id = stats.right_id;

这里的每个连接条件都允许散列和/或合并连接,因此计划者将有机会避免嵌套循环。