Question

下午好，

我正在处理州，县和市的维度数据样本。我通过社会安全号码来衡量个人地址。以下是数据集的示例：

CREATE TABLE dm_test_case
(
    state varchar(1),
    county varchar(1),
    city varchar(1),
    address int,
    ssn int
);


INSERT INTO dm_test_case (state, county, city, address, ssn)
VALUES  ('a','a','a',100,1),
        ('a','b','b',101,2),
        ('a','b','c',102,2),
        ('a','c','d',103,3),
        ('a','c','d',103,3),
        ('b','d','e',104,4),
        ('b','d','e',105,4);

SELECT  *
FROM    dm_test_case


SELECT   state
        ,county
        ,city
        ,address
        ,COUNT(DISTINCT ssn) AS unique_persons
FROM    dm_test_case
GROUP BY    GROUPING SETS
            (
                (state, county, city, address),
                (state, county, city),
                (state, county),
                ()
            )

对此数据集感兴趣的内容如下：

SSN 1是完全独特的
SSN 2在2个城市有2个地址
SSN 3是重复的记录
SSN 4在1个城市有2个地址

使用分组集的任意组合，聚合给出的unique_persons的数量为4.根据分组，此数字可能是也可能不是（即一个人可能住在两个城市，县或州）。 / p>

我可以使用什么方法来显示unique_persons的真正“总和” - 无论数据如何分组 - 不是聚合总和？

作为示例，我希望以下查询为分组集返回值5 unique_persons。

SELECT   state
        ,county
        ,city
        ,COUNT(DISTINCT ssn) AS unique_persons
FROM    dm_test_case
GROUP BY    GROUPING SETS
            (
                (state, county, city),
                ()
            )

我希望这个例子很清楚。

干杯。

分组集和计数（DISTINCT）

0 个答案: