SQLITE多对多关系频率计数

时间:2019-11-20 20:03:52

标签: sql sqlite orm subquery

我在SQLITE数据库中有三个表,分别称为Needle,NeedleHaystack和Haystack,它们用于多对多关系联接表中。我需要计算出针头中每个唯一项目的出现频率,出现在每个唯一干草堆中的频率(百分比是可取的,计数是可以接受的。)

  Needle          NeedleHaystack                    Haystack
  +----+-------+  +----+-----------+-------------+  +----+-------+
  | id | value |  | id | needle_id | haystack_id |  | id | value |
  +----+-------+  +----+-----------+-------------+  +----+-------+
  |  1 |  foo1 |  | 1  |    1      |     7       |  | 7  |  bar7 |
  |  2 |  foo2 |  | 2  |    1      |     8       |  | 8  |  bar8 |
  |  3 |  foo3 |  | 3  |    1      |     9       |  | 9  |  bar9 |
  +----+-------+  | 4  |    2      |     7       |  +----+-------+
                  +----+-----------+-------------+

最终我们得到这样的结果

  +-----------+--------------------------+
  | needle_id | frequency_over_haystacks |
  +-----------+--------------------------+
  |    1      |          100%            | // needle id 1 appears in 100% of Haystacks
  |    2      |           33%            | // needle id 2 appears in 33% of Haystacks
  |    3      |            0%            | // needle id 3 appears in no Haystacks
  +-----------+--------------------------+ // on and on, for each needle that may be present...

1 个答案:

答案 0 :(得分:1)

我认为您基本上是希望聚合的

select n.needle_id, count(distinct nh.haystack_id) * 1.0 / h.cnt
from needle n left join
     needlehaystack nh
     on n.needle_id = nh.needle_id cross join
     (select count(*) as cnt from haystack) h
group by n.needle_id, h.cnt;

这允许在needlehaystack中重复。如果不允许这些,请使用count(*)而不是count(distinct)中的select