计算群组之间的重叠

时间:2015-06-16 16:34:56

标签: sql oracle oracle11g pivot

我有一个包含两列感兴趣的列表item_idbucket_idbucket_id有一定数量的值,如果需要,我可以将它们列出来。

每个item_id可以多次出现,但每次出现都会有一个单独的bucket_id值。例如,item_id的{​​{1}}可以在表格中出现两次,一次出现在123 bucket_id下,一次出现在A下。

我的目标是确定每对B值之间存在多少重叠,并将其显示为N-by-N矩阵。

例如,请考虑以下小示例表:

bucket_id

因此,对于此数据集,存储桶item_id bucket_id ========= =========== 111 A 111 B 111 C 222 B 222 D 333 A 333 C 444 C A共有一个B,存储桶item_idC没有共同的项目,等等。

我想将上面的表格格式化为以下内容:

D

在上表中,行和列的交叉表示两个 A B C D =================================== A 2 1 2 0 B 1 2 1 1 C 2 1 3 0 D 0 1 0 1 值中存在多少条记录。例如,bucket_id行与A列相交的位置我们有C,因为2 A和C中都有2条记录。 X和Y与Y和X的交点相同,上面的表格在对角线上镜像。

我想这个查询涉及bucket_id,但我不能为我的生活弄清楚如何让它运转起来。

2 个答案:

答案 0 :(得分:1)

你可以使用简单的PIVOT:

SELECT t1.bucket_id,
       SUM( CASE WHEN t2.bucket_id = 'A' THEN 1 ELSE 0 END ) AS A,
       SUM( CASE WHEN t2.bucket_id = 'B' THEN 1 ELSE 0 END ) AS B,
       SUM( CASE WHEN t2.bucket_id = 'C' THEN 1 ELSE 0 END ) AS C,
       SUM( CASE WHEN t2.bucket_id = 'D' THEN 1 ELSE 0 END ) AS D
FROM table1 t1
JOIN table1 t2 ON t1.item_id = t2.item_id
GROUP BY t1.bucket_id
ORDER BY 1
;

或者你可以使用Oracle PIVOT子句(适用于11.2及以上版本):

SELECT * FROM (
   SELECT t1.bucket_id AS Y_bid,
          t2.bucket_id AS x_bid
   FROM table1 t1
   JOIN table1 t2 ON t1.item_id = t2.item_id
)
PIVOT (
  count(*) FOR x_bid in ('A','B','C','D')
)
ORDER BY 1
;

示例:http://sqlfiddle.com/#!4/39d30/7

答案 1 :(得分:0)

我相信这可以为您提供所需的数据。然后可以以编程方式(或在Excel中等)完成对表的透视。

-- This gets the distinct pairs of buckets
select distinct
    a.name,
    b.name
from
    bucket a
    join bucket b
where
    a.name < b.name
order by
    a.name,
    b.name

+ --------- + --------- +
| name      | name      |
+ --------- + --------- +
| A         | B         |
| A         | C         |
| A         | D         |
| B         | C         |
| B         | D         |
| C         | D         |
+ --------- + --------- +
6 rows

-- This gets the distinct pairs of buckets with the counts you are looking for
select distinct
    a.name,
    b.name,
    count(distinct bi.item_id)
from
    bucket a
    join bucket b
    left outer join bucket_item ai on ai.bucket_name = a.name
    left outer join bucket_item bi on bi.bucket_name = b.name and ai.item_id = bi.item_id
where
    a.name < b.name
group by
    a.name,
    b.name
order by
    a.name,
    b.name

+ --------- + --------- + ------------------------------- +
| name      | name      | count(distinct bi.item_id)      |
+ --------- + --------- + ------------------------------- +
| A         | B         | 2                               |
| A         | C         | 1                               |
| A         | D         | 0                               |
| B         | C         | 2                               |
| B         | D         | 0                               |
| C         | D         | 0                               |
+ --------- + --------- + ------------------------------- +
6 rows

以下是使用DDL和插入设置的整个示例(这是在mysql中,但同样的想法适用于其他地方):

use example;

drop table if exists bucket;

drop table if exists item;

drop table bucket_item;

create table bucket (
    name varchar(1)
);

create table item(
    id int
);

create table bucket_item(
    bucket_name varchar(1) references bucket(name),
    item_id int references item(id)
);

insert into bucket values ('A');
insert into bucket values ('B');
insert into bucket values ('C');
insert into bucket values ('D');

insert into item values (111);
insert into item values (222);
insert into item values (333);
insert into item values (444);
insert into item values (555);

insert into bucket_item values ('A',111);
insert into bucket_item values ('A',222);
insert into bucket_item values ('A',333);
insert into bucket_item values ('B',222);
insert into bucket_item values ('B',333);
insert into bucket_item values ('B',444);
insert into bucket_item values ('C',333);
insert into bucket_item values ('C',444);
insert into bucket_item values ('D',555);


-- query to get distinct pairs of buckets
select distinct
    a.name,
    b.name
from
    bucket a
    join bucket b
where
    a.name < b.name
order by
    a.name,
    b.name
;

select distinct
    a.name,
    b.name,
    count(distinct bi.item_id)
from
    bucket a
    join bucket b
    left outer join bucket_item ai on ai.bucket_name = a.name
    left outer join bucket_item bi on bi.bucket_name = b.name and ai.item_id = bi.item_id
where
    a.name < b.name
group by
    a.name,
    b.name
order by
    a.name,
    b.name
;