我在SQLITE数据库中有三个表,分别称为Needle,NeedleHaystack和Haystack,它们用于多对多关系联接表中。我需要计算出针头中每个唯一项目的出现频率,出现在每个唯一干草堆中的频率(百分比是可取的,计数是可以接受的。)
Needle NeedleHaystack Haystack
+----+-------+ +----+-----------+-------------+ +----+-------+
| id | value | | id | needle_id | haystack_id | | id | value |
+----+-------+ +----+-----------+-------------+ +----+-------+
| 1 | foo1 | | 1 | 1 | 7 | | 7 | bar7 |
| 2 | foo2 | | 2 | 1 | 8 | | 8 | bar8 |
| 3 | foo3 | | 3 | 1 | 9 | | 9 | bar9 |
+----+-------+ | 4 | 2 | 7 | +----+-------+
+----+-----------+-------------+
最终我们得到这样的结果
+-----------+--------------------------+
| needle_id | frequency_over_haystacks |
+-----------+--------------------------+
| 1 | 100% | // needle id 1 appears in 100% of Haystacks
| 2 | 33% | // needle id 2 appears in 33% of Haystacks
| 3 | 0% | // needle id 3 appears in no Haystacks
+-----------+--------------------------+ // on and on, for each needle that may be present...
答案 0 :(得分:1)
我认为您基本上是希望聚合的
select n.needle_id, count(distinct nh.haystack_id) * 1.0 / h.cnt
from needle n left join
needlehaystack nh
on n.needle_id = nh.needle_id cross join
(select count(*) as cnt from haystack) h
group by n.needle_id, h.cnt;
这允许在needlehaystack
中重复。如果不允许这些,请使用count(*)
而不是count(distinct)
中的select
。