Question

我有一个汇总表，如下所示：

+---------+----------+
| left_id | right_id |
+---------+----------+
| a       | b        |
+---------+----------+
| a       | c        |
+---------+----------+

还有一个值表：

+----+-------+
| id | value |
+----+-------+
| a  | 1     |
+----+-------+
| a  | 2     |
+----+-------+
| a  | 3     |
+----+-------+
| b  | 1     |
+----+-------+
| b  | 4     |
+----+-------+
| b  | 5     |
+----+-------+
| c  | 1     |
+----+-------+
| c  | 2     |
+----+-------+
| c  | 3     |
+----+-------+
| c  | 4     |
+----+-------+

对于每对，我想计算并比较，相交和设置差异（每种方式）的长度，以比较值，以便输出看起来像这样：

+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a       | b        | 5     | 1            | 2         | 2          |
+---------+----------+-------+--------------+-----------+------------+
| a       | c        | 4     | 3            | 0         | 1          |
+---------+----------+-------+--------------+-----------+------------+

使用PostgreSQL解决此问题的最佳方法是什么？

更新：这是带有数据https://rextester.com/RWID9864的右下角链接

Answer 1

您需要执行此操作的标量子查询。

UNION也可以用retrieve表示，这使得该查询的编写时间较短。但是对于交叉点，您需要更长的查询时间。

要计算“差异”，请使用def subString(string) sentence = string print"=========================\n" print sentence print "\n" print "Enter the word you want to replace: " replaceWord = gets print "Enter what you want the new word to be: " newWord = gets sentence[replaceWord] = [newWord] print sentence #newString = sentence.gsub(replaceWord, newWord) #newString = sentence.gsub("World", "Ruby") #print newString end运算符：

OR

Answer 2

我不知道是什么原因导致您运行缓慢，因为我看不到表格大小和/或无法解释计划。假设两个表都足够大，以至于嵌套循环效率低下，并且不敢考虑将值连接到自身，那么我将尝试从这样的标量子查询中重写它：

select p.*,
       coalesce(stats."union", 0) "union",
       coalesce(stats.intersection, 0) intersection,
       coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
       coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
       select left_id,
              right_id,
              count(*) "union",
              count(has_left and has_right) intersection,
              count(has_left) left_cnt,
              count(has_right) right_cnt
       from (
              select p.*,
                     v."value" the_value,
                     true has_left
              from pairs p
              join "values" v on v.id = p.left_id
       ) l
       full join (
              select p.*,
                     v."value" the_value,
                     true has_right
              from pairs p
              join "values" v on v.id = p.right_id
       ) r using(left_id, right_id, the_value)
       group by left_id,
                right_id
) stats on p.left_id = stats.left_id
       and p.right_id = stats.right_id;

这里的每个连接条件都允许散列和/或合并连接，因此计划者将有机会避免嵌套循环。

PostgreSQL汇总并集，交集和集差异

2 个答案: