基本上我试图在这个立方结果中得到一个明显的计数。但不幸的是你不能使用Count(distinct(Field))和cube(汇总)as stated here)
以下是数据的样子。 (这只是一个我希望在数据中重复的简单例子)
Category1 Category2 ItemId
a b 1
a b 1
a a 1
a a 2
a c 1
a b 2
a b 3
a c 2
a a 1
a a 3
a c 4
这是我想做的,但它不起作用。
SELECT
Category1,
Category2,
Count(Distinct(ItemId))
FROM ItemList IL
GROUP BY
Category1,
Category2
WITH CUBE
我知道我可以像这样做一个子选择来得到我想要的结果:
SELECT
*,
(SELECT
Count(Distinct(ItemId))
FROM ItemList IL2
WHERE
(Q1.Category1 IS NULL OR Q1.Category1 IS NOT NULL AND Q1.Category1 = IL2.Category1)
AND
(Q1.Category2 IS NULL OR Q1.Category2 IS NOT NULL AND Q1.Category2 = IL2.Category2))
AS DistinctCountOfItems
FROM (SELECT
Category1,
Category2
FROM ItemList IL
GROUP BY
Category1,
Category2
WITH CUBE) Q1
但是由于子选择,结果集很大时运行速度很慢。有没有其他方法可以从立方结果中获得一个独特的计数?
这是我想看到的结果
Category1 Category2 DistinctCountOfItems
a a 3
a b 3
a c 3
a NULL 4
NULL NULL 4
NULL a 3
NULL b 3
NULL c 3
答案 0 :(得分:6)
你应该能够清理你的“凌乱”答案:
select Category1, Category2, count(distinct ItemId)
from ItemList
group by Category1, Category2
UNION ALL
select Category1, null, count(distinct ItemId)
from ItemList
group by Category1
UNION ALL
select null, Category2, count(distinct ItemId)
from ItemList
group by Category2
UNION ALL
select null, null, count(distinct ItemId)
from ItemList
然后我提出了另一个选项:
select IL1.Category1, IL1.Category2, count(distinct ItemId)
from (
select Category1, Category2
from ItemList
group by Category1, Category2
with cube
) IL1
join ItemList IL2 on (IL1.Category1=IL2.Category1 and IL1.Category2=IL2.Category2)
or (IL1.Category1 is null and IL1.Category2=IL2.Category2)
or (IL1.Category2 is null and IL1.Category1=IL2.Category1)
or (IL1.Category1 is null and IL1.Category2 is null)
group by IL1.Category1, IL1.Category2
效率可能因索引,分组列数等而异。对于我写的测试表,子选择和连接(与Unions相对)稍好一些。我目前无法访问MSSQL 2000实例(我在2005年的实例上测试过),但我认为这里的任何内容都无效。
<强>更新强>
一个更好的选项,特别是如果你要分组超过2列(如果你在8列上分组,上面的代码将需要256个join子句来捕获所有空组合!):
select IL1.Category1, IL1.Category2, count(distinct ItemId)
from (
select Category1, Category2
from ItemList
group by Category1, Category2
with cube
) IL1
inner join ItemList IL2 on isnull(IL1.Category1,IL2.Category1)=IL2.Category1
and isnull(IL1.Category2,IL2.Category2)=IL2.Category2
group by IL1.Category1, IL1.Category2
答案 1 :(得分:1)
这是我发现的另一种可能性,但它非常凌乱。但是它比使用子选择运行得更快。
SELECT
category1,
category2,
count(distinct itemid)
FROM (SELECT DISTINCT
category1,
category2,
itemid
FROM ItemList
) x
GROUP BY category1, category2
UNION ALL
SELECT
category1,
NULL,
count(distinct itemid)
FROM (SELECT DISTINCT
category1,
category2,
itemid
FROM ItemList
) x
GROUP BY category1
UNION ALL
SELECT
NULL,
category2,
count(distinct itemid)
FROM (SELECT DISTINCT
category1,
category2,
itemid
FROM ItemList
) x
GROUP BY category2
UNION ALL
SELECT
NULL,
NULL,
count(distinct itemid)
FROM (SELECT DISTINCT
category1,
category2,
itemid
FROM ItemList
) x
答案 2 :(得分:-1)
这非常有趣。我可以在SQL Server 2008 R2中运行您的第一个查询,但文档说它不起作用。
这是您的第二个查询的变体,可能会有更好的表现。它在子查询中执行非重复计数,在外部查询中执行多维数据集
SELECT Category1, Category2, MAX(DistinctCount) as DistinctCount
FROM (
SELECT Category1, Category2, COUNT(DISTINCT ItemId) as DistinctCount
FROM ItemList
GROUP BY Category1, Category2
) Q1
GROUP BY Category1, Category2
WITH CUBE
答案 3 :(得分:-1)
这个怎么样?
内部查询将返回不同的结果。
SELECT ORIGINAL_ITEM.Category1, DISTINCT_ITEM.Category2, DISTINCT_ITEM.cnt
FROM
( SELECT DISTINCT category2, COUNT(*) as CNT
FROM ItemList ) DISTINCT_ITEM
JOIN ItemList ORIGINAL_ITEM on ORIGINAL_ITEM.category2 = DISTINCT_ITEM.category2
GROUP BY ORIGINAL_ITEM.category1, DISTINCT_ITEM.category2
答案 4 :(得分:-1)
我有以下版本:
Microsoft SQL Server 2008 R2(RTM) - 10.50.1600.1(Intel X86)2010年4月2日15:53:02版权所有(c)Microsoft Corporation在Windows NT 5.1上具有高级服务的Express Edition(Build 2600:Service Pack 3)
当我运行您的查询时
SELECT
Category1,
Category2,
COUNT(DISTINCT(ItemId))
FROM ItemList IL
GROUP BY
Category1,
Category2
WITH CUBE
我得到这些结果
a a 3
a b 3
a c 3
NULL a 3
NULL b 3
NULL c 3
a NULL 4
NULL NULL 4