Question

我有一个包含两列感兴趣的列表item_id和bucket_id。 bucket_id有一定数量的值，如果需要，我可以将它们列出来。

每个item_id可以多次出现，但每次出现都会有一个单独的bucket_id值。例如，item_id的{{1}}可以在表格中出现两次，一次出现在123 bucket_id下，一次出现在A下。

我的目标是确定每对B值之间存在多少重叠，并将其显示为N-by-N矩阵。

例如，请考虑以下小示例表：

bucket_id

因此，对于此数据集，存储桶item_id bucket_id ========= =========== 111 A 111 B 111 C 222 B 222 D 333 A 333 C 444 C和A共有一个B，存储桶item_id和C没有共同的项目，等等。

我想将上面的表格格式化为以下内容：

在上表中，行和列的交叉表示两个A B C D =================================== A 2 1 2 0 B 1 2 1 1 C 2 1 3 0 D 0 1 0 1值中存在多少条记录。例如，bucket_id行与A列相交的位置我们有C，因为2 A和C中都有2条记录。 X和Y与Y和X的交点相同，上面的表格在对角线上镜像。

我想这个查询涉及bucket_id，但我不能为我的生活弄清楚如何让它运转起来。

Answer 1

你可以使用简单的PIVOT：

SELECT t1.bucket_id,
       SUM( CASE WHEN t2.bucket_id = 'A' THEN 1 ELSE 0 END ) AS A,
       SUM( CASE WHEN t2.bucket_id = 'B' THEN 1 ELSE 0 END ) AS B,
       SUM( CASE WHEN t2.bucket_id = 'C' THEN 1 ELSE 0 END ) AS C,
       SUM( CASE WHEN t2.bucket_id = 'D' THEN 1 ELSE 0 END ) AS D
FROM table1 t1
JOIN table1 t2 ON t1.item_id = t2.item_id
GROUP BY t1.bucket_id
ORDER BY 1
;

或者你可以使用Oracle PIVOT子句（适用于11.2及以上版本）：

SELECT * FROM (
   SELECT t1.bucket_id AS Y_bid,
          t2.bucket_id AS x_bid
   FROM table1 t1
   JOIN table1 t2 ON t1.item_id = t2.item_id
)
PIVOT (
  count(*) FOR x_bid in ('A','B','C','D')
)
ORDER BY 1
;

示例：http://sqlfiddle.com/#!4/39d30/7

Answer 2

我相信这可以为您提供所需的数据。然后可以以编程方式（或在Excel中等）完成对表的透视。

-- This gets the distinct pairs of buckets
select distinct
    a.name,
    b.name
from
    bucket a
    join bucket b
where
    a.name < b.name
order by
    a.name,
    b.name

+ --------- + --------- +
| name      | name      |
+ --------- + --------- +
| A         | B         |
| A         | C         |
| A         | D         |
| B         | C         |
| B         | D         |
| C         | D         |
+ --------- + --------- +
6 rows

-- This gets the distinct pairs of buckets with the counts you are looking for
select distinct
    a.name,
    b.name,
    count(distinct bi.item_id)
from
    bucket a
    join bucket b
    left outer join bucket_item ai on ai.bucket_name = a.name
    left outer join bucket_item bi on bi.bucket_name = b.name and ai.item_id = bi.item_id
where
    a.name < b.name
group by
    a.name,
    b.name
order by
    a.name,
    b.name

+ --------- + --------- + ------------------------------- +
| name      | name      | count(distinct bi.item_id)      |
+ --------- + --------- + ------------------------------- +
| A         | B         | 2                               |
| A         | C         | 1                               |
| A         | D         | 0                               |
| B         | C         | 2                               |
| B         | D         | 0                               |
| C         | D         | 0                               |
+ --------- + --------- + ------------------------------- +
6 rows

以下是使用DDL和插入设置的整个示例（这是在mysql中，但同样的想法适用于其他地方）：

use example;

drop table if exists bucket;

drop table if exists item;

drop table bucket_item;

create table bucket (
    name varchar(1)
);

create table item(
    id int
);

create table bucket_item(
    bucket_name varchar(1) references bucket(name),
    item_id int references item(id)
);

insert into bucket values ('A');
insert into bucket values ('B');
insert into bucket values ('C');
insert into bucket values ('D');

insert into item values (111);
insert into item values (222);
insert into item values (333);
insert into item values (444);
insert into item values (555);

insert into bucket_item values ('A',111);
insert into bucket_item values ('A',222);
insert into bucket_item values ('A',333);
insert into bucket_item values ('B',222);
insert into bucket_item values ('B',333);
insert into bucket_item values ('B',444);
insert into bucket_item values ('C',333);
insert into bucket_item values ('C',444);
insert into bucket_item values ('D',555);


-- query to get distinct pairs of buckets
select distinct
    a.name,
    b.name
from
    bucket a
    join bucket b
where
    a.name < b.name
order by
    a.name,
    b.name
;

select distinct
    a.name,
    b.name,
    count(distinct bi.item_id)
from
    bucket a
    join bucket b
    left outer join bucket_item ai on ai.bucket_name = a.name
    left outer join bucket_item bi on bi.bucket_name = b.name and ai.item_id = bi.item_id
where
    a.name < b.name
group by
    a.name,
    b.name
order by
    a.name,
    b.name
;

计算群组之间的重叠

2 个答案: