Question

我有一个2008 SQL服务器，有一个大表，我需要在多个列上进行COUNT DISTINCT查询。有些列是varchar，其他列是int。

到目前为止的查询如下所示：

SELECT 
    CAST(datepart(yyyy, [HistDate]) as varchar(4)) + '-' + CAST(datepart(mm, [HistDate]) as varchar(2)) + '-1' AS [DateSelector], 
    [Document] AS [Document], 
    -- This is the bit that needs optimizing
    COUNT( DISTINCT(
    Document + 
    Reference + 
    CONVERT(varchar(20),BatchID) +              -- this is an int
    ISNULL(CONVERT(varchar(20),ResetCount),'')) -- this is an int
FROM documents
GROUP BY
    CAST(datepart(yyyy, [HistDate]) as varchar(4)) + '-' + CAST(datepart(mm, [HistDate]) as varchar(2)) + '-1' AS [DateSelector], 
    [Document] AS [Document], 
ORDER BY ...

目前此查询需要23秒，而用COUNT（*）替换上述COUNT需要几秒钟。我尝试添加一个综合指数，产生了0个改进。我可以做些什么样的优化来加快搜索速度？

Answer 1

您可以使用

来缩短时间

group by datepart(yyyy, Zeitstempel), datepart(mm, Zeitstempel)

您可以仅对没有转换的整数进行分组，并仍然在选择中使用它。

Answer 2

连接列并不能提高性能。

请改为尝试：

;WITH CTE AS
(
  SELECT 
    [HistDate],
    [Document] AS [Document], 
    row_number() over (partition by Document, Reference + BatchID + ResetCount order by (select 1)) rn
  FROM documents
)
SELECT
  convert(char(8),dateadd(mm, 
    datediff(mm, 0, [HistDate]), 0), 126)+'1' AS [DateSelector], 
  [Document] AS [Document],
  count(*) as cnt
FROM CTE
WHERE rn = 1
GROUP BY
  -- note you cant name a column in group by
  dateadd(month, datediff(month, 0, [HistDate]), 0),
  [Document]

搜索连续列时提高性能

2 个答案: