Question

基本上我试图在这个立方结果中得到一个明显的计数。但不幸的是你不能使用Count（distinct（Field））和cube（汇总）as stated here）

以下是数据的样子。（这只是一个我希望在数据中重复的简单例子）

    Category1       Category2       ItemId
    a               b               1
    a               b               1
    a               a               1
    a               a               2
    a               c               1
    a               b               2
    a               b               3
    a               c               2
    a               a               1
    a               a               3
    a               c               4

这是我想做的，但它不起作用。

SELECT
  Category1,
  Category2,
  Count(Distinct(ItemId))
FROM ItemList IL
GROUP BY
  Category1,
  Category2
WITH CUBE

我知道我可以像这样做一个子选择来得到我想要的结果：

SELECT
  *,
  (SELECT
     Count(Distinct(ItemId)) 
   FROM ItemList IL2 
   WHERE 
     (Q1.Category1 IS NULL OR Q1.Category1 IS NOT NULL AND Q1.Category1 = IL2.Category1) 
     AND
     (Q1.Category2 IS NULL OR Q1.Category2 IS NOT NULL AND Q1.Category2 = IL2.Category2))
       AS DistinctCountOfItems 
FROM (SELECT
        Category1,
        Category2
      FROM ItemList IL
      GROUP BY
        Category1,
        Category2
      WITH CUBE) Q1

但是由于子选择，结果集很大时运行速度很慢。有没有其他方法可以从立方结果中获得一个独特的计数？

这是我想看到的结果

Category1     Category2    DistinctCountOfItems
a             a            3
a             b            3
a             c            3
a             NULL         4
NULL          NULL         4
NULL          a            3
NULL          b            3
NULL          c            3

Answer 1

你应该能够清理你的“凌乱”答案：

select Category1, Category2, count(distinct ItemId)
from ItemList
group by Category1, Category2
UNION ALL
select Category1, null, count(distinct ItemId)
from ItemList
group by Category1
UNION ALL
select null, Category2, count(distinct ItemId)
from ItemList
group by Category2
UNION ALL
select null, null, count(distinct ItemId)
from ItemList

然后我提出了另一个选项：

select IL1.Category1, IL1.Category2, count(distinct ItemId)
from ( 
  select Category1, Category2
  from ItemList
  group by Category1, Category2
  with cube
 ) IL1
 join ItemList IL2 on (IL1.Category1=IL2.Category1 and IL1.Category2=IL2.Category2)
      or (IL1.Category1 is null and IL1.Category2=IL2.Category2)
      or (IL1.Category2 is null and IL1.Category1=IL2.Category1)
      or (IL1.Category1 is null and IL1.Category2 is null)
group by IL1.Category1, IL1.Category2

效率可能因索引，分组列数等而异。对于我写的测试表，子选择和连接（与Unions相对）稍好一些。我目前无法访问MSSQL 2000实例（我在2005年的实例上测试过），但我认为这里的任何内容都无效。

<强>更新

一个更好的选项，特别是如果你要分组超过2列（如果你在8列上分组，上面的代码将需要256个join子句来捕获所有空组合！）：

select IL1.Category1, IL1.Category2, count(distinct ItemId)
from ( 
  select Category1, Category2
  from ItemList
  group by Category1, Category2
  with cube
 ) IL1
 inner join ItemList IL2 on isnull(IL1.Category1,IL2.Category1)=IL2.Category1
                  and isnull(IL1.Category2,IL2.Category2)=IL2.Category2
group by IL1.Category1, IL1.Category2

Answer 2

这是我发现的另一种可能性，但它非常凌乱。但是它比使用子选择运行得更快。

SELECT 
  category1, 
  category2, 
  count(distinct itemid)
FROM (SELECT DISTINCT 
        category1, 
        category2, 
        itemid
      FROM ItemList
) x
GROUP BY category1, category2
UNION ALL
SELECT 
  category1, 
  NULL, 
  count(distinct itemid)
FROM (SELECT DISTINCT 
        category1, 
        category2, 
        itemid
      FROM ItemList
) x
GROUP BY category1
UNION ALL
SELECT 
  NULL, 
  category2, 
  count(distinct itemid)
FROM (SELECT DISTINCT 
        category1, 
        category2, 
        itemid
      FROM ItemList
) x
GROUP BY category2
UNION ALL
SELECT 
  NULL, 
  NULL, 
  count(distinct itemid)
FROM (SELECT DISTINCT 
        category1, 
        category2, 
        itemid
      FROM ItemList
) x

Answer 3

这非常有趣。我可以在SQL Server 2008 R2中运行您的第一个查询，但文档说它不起作用。

这是您的第二个查询的变体，可能会有更好的表现。它在子查询中执行非重复计数，在外部查询中执行多维数据集

SELECT Category1, Category2, MAX(DistinctCount) as DistinctCount
FROM (
    SELECT Category1, Category2, COUNT(DISTINCT ItemId) as DistinctCount
    FROM ItemList
    GROUP BY Category1, Category2
    ) Q1
GROUP BY Category1, Category2
WITH CUBE

Answer 4

这个怎么样？

内部查询将返回不同的结果。

SELECT ORIGINAL_ITEM.Category1, DISTINCT_ITEM.Category2, DISTINCT_ITEM.cnt
FROM
  ( SELECT DISTINCT category2, COUNT(*) as CNT
    FROM ItemList ) DISTINCT_ITEM
JOIN ItemList ORIGINAL_ITEM on ORIGINAL_ITEM.category2 = DISTINCT_ITEM.category2 
GROUP BY ORIGINAL_ITEM.category1, DISTINCT_ITEM.category2

Answer 5

我有以下版本：

当我运行您的查询时

SELECT 
  Category1, 
  Category2, 
  COUNT(DISTINCT(ItemId))
FROM ItemList IL 
GROUP BY 
  Category1, 
  Category2 
WITH CUBE

我得到这些结果

a       a       3
a       b       3
a       c       3
NULL    a       3
NULL    b       3
NULL    c       3
a       NULL    4
NULL    NULL    4

SQL Server使用“分组依据...使用多维数据集”获得独特计数

5 个答案: