我有一个像这样的表:
| event id | item 1 id | item 2 id | set |
| 1 | 1 | 2 | 1 |
| 1 | 1 | 3 | 1 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 3 | 1 |
| 1 | 3 | 1 | 1 |
| 1 | 3 | 2 | 1 |
| 1 | 2 | 4 | 2 |
| 1 | 4 | 2 | 2 |
| 2 | 1 | 4 | 3 |
| 2 | 1 | 5 | 3 |
| 2 | 4 | 1 | 3 |
| 2 | 4 | 5 | 3 |
| 2 | 5 | 1 | 3 |
| 2 | 5 | 4 | 3 |
现在我要分别计算item1以及item1与item2结合的发生次数
我尝试了以下操作:
with count_item1 AS (
select event_id, item_1_id, count(distinct set) AS c1 from table
group by event_id, item_1_id
), count_item1_and_item2 AS (
select event_id, item_1_id, item_2_id, count(distinct set) AS c2 from table
group by event_id, item_1_id, item_2_id
)
select t1.event_id, t1.item_1_id, t1.item_2_id, t1.c2, t2.c1
from count_item1_and_item2 AS t1
inner join count_item1 AS t2
on t1.event_id=t2.event_id and t1.item_1_id=t2.item_1_id
例如上表 因此,结果应为:
| event id | item 1 id | item 2 id | c1 | c2 |
| 1 | 1 | 2 | 1 | 1 |
| 1 | 1 | 3 | 1 | 1 |
| 1 | 2 | 1 | 2 | 1 |
| 1 | 2 | 3 | 2 | 1 |
| 1 | 3 | 1 | 1 | 1 |
| 1 | 3 | 2 | 1 | 1 |
| 1 | 2 | 4 | 2 | 1 |
| 1 | 4 | 2 | 1 | 1 |
| 2 | 1 | 4 | 1 | 1 |
| 2 | 1 | 5 | 1 | 1 |
| 2 | 4 | 1 | 1 | 1 |
| 2 | 4 | 5 | 1 | 1 |
| 2 | 5 | 1 | 1 | 1 |
| 2 | 5 | 4 | 1 | 1 |
每行的含义是:在event_id中,项目1在不同集合中出现c1,而(item1,item2)在不同集合中出现c2。 还(item1,item2)的计数等于(item2,item1)
然后我发现了一件奇怪的事情: item1本身的出现次数应不少于item1和item2的结合次数,但我发现有时item1和item2的结合数要大于item1的计数,这是我做错了吗?我认为我的想法是正确的,但没有得到结果。我已经在这里堆了一个周末。