Question

我知道它们返回不同的结果（前一个为空，后一个不为空）。 这不是我的问题。试想一下我不在乎的情况（要么是因为没有空值，要么是因为只有少数，而我只想对行中的行数有一个大致的了解数据库）。

我的问题与以下（可能的）矛盾有关：

Here SQL标记中的最高代表用户之一说

您对COUNT(*)或COUNT(column)的使用应基于所需的输出仅。

另一方面，here是47次被批评的评论

...如果您有不可为空的列（例如ID），则count（ID）将大大提高了性能（*）。

两者似乎相互矛盾。所以有人可以向我解释 为什么 是正确的吗？

Answer 1

根据scsimon的答案，我在SQL Server 2017上进行了1000万行的测试。性能结果位于底部。

编辑

根据Jeroen评论的建议，我在测试表中添加了更多列：int identity，bigint not null和可为空的tinyint。在每种情况下，select count查询都选择对tinyint列上的索引进行扫描。两次查询之间的估计成本是相同的，但是select count(NullableColumn)比其他select count方法要慢得多。

代码

if object_id('tempdb..#table') is null
begin
    declare @Count bigint = 10000000

    create table #Table
    (
        id int identity primary key clustered,
        BigID bigint not null,
        TinyC tinyint null
    );

    WITH
        E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
        E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
        E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
        E16(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+16 or 10,000,000,000,000,000 rows max
        cteTally(N) AS 
        (
            SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E16
        )
    insert into #table (BigId, TinyC)
    select N, case when N % 2 = 0 then null else N % 8 end
    from cteTally
    where N <= @Count;

    create unique index IX_BigID on #table (BigID);
    create index IX_TinyC on #table (TinyC);
end

set statistics io on
set statistics time on

print 'count(*)'
select count(*) from #table
option (maxdop 1)

print 'count(ID)'
select count(ID) from #table
option (maxdop 1)

print 'count(BigID)'
select count(BigID) from #table
option (maxdop 1)

print 'count(TinyC)'
select count(TinyC) from #table
option (maxdop 1)

set statistics io off
set statistics time off

--drop table #table

性能

count(*)

表“ #Table”。扫描计数1，逻辑读13617，物理读0，预读0，lob逻辑读0，lob物理读0，lob预读0。

SQL Server执行时间： CPU时间= 735毫秒，经过时间= 746毫秒

count(ID) -- int identity primary key clustered

表“ #Table”。扫描计数1，逻辑读13617，物理读0，预读0，lob逻辑读0，lob物理读0，lob预读0。

SQL Server执行时间： CPU时间= 765毫秒，经过时间= 776毫秒

count(BigID) -- bigint not null, indexed

表“ #Table”。扫描计数1，逻辑读13617，物理读0，预读0，lob逻辑读0，lob物理读0，lob预读0。

SQL Server执行时间： CPU时间= 735毫秒，经过时间= 731毫秒

count(TinyC) -- tinyint nullable, indexed

警告：通过聚合或其他SET操作消除了空值。

表“ #Table”。扫描计数1，逻辑读13617，物理读0，预读0，lob逻辑读0，lob物理读0，lob预读0。

SQL Server执行时间： CPU时间= 1593毫秒，经过的时间= 1584毫秒。

Answer 2

我希望count(column)会慢一点，但快速测试（如下）并不是这种情况，但更重要的是您自count(*)以来发布的第一个链接...和count(column)可以基于可空列产生不同的结果。另外，我假设您没有返回任何其他列，因此删除了对您的环境和索引唯一的测试。

WITH
    E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS 
    (
        SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    )
select  N
into #table
from cteTally


update #table
set N = null
where N % 2 = 0

select count(*) from #table
select count(N) from #table

类似地，在具有361,912行的表上，以下是count(*)，count(pk_column)和count(nullable_column)的结果，它们不属于索引：

Count（*）vs Count（id）速度

2 个答案: