Question

我正在SQL Server中运行查询，以计算澳大利亚州数据库中存在的唯一电子邮件地址的数量。但是，当我尝试调和这些数字以确保它们是正确的时，我注意到了一个差异，这让我觉得我的查询不对。以下是我用来协调数字和实际结果的查询：

/** Count the total number of active members (status=1) since last night **/
SELECT count(distinct(email)) Total FROM [member] WHERE status = 1 
AND (created_datetime <= '2013-01-11' OR created_datetime IS NULL)
/** RESULT: 8958 **/

/** Count the number of active members (status=1) who live in Victoria since last night **/
SELECT count(distinct(email)) Total FROM [member] WHERE status = 1 
AND (created_datetime <= '2013-01-11' OR created_datetime IS NULL)
AND [state] = 'vic'
/** RESULT: 7545 **/

/** Count the number of active members (status=1) who don't live in Victoria since last night **/
SELECT count(distinct(email)) Total FROM [member] WHERE status = 1 
AND (created_datetime <= '2013-01-11' OR created_datetime IS NULL)
AND [state] <> 'vic'
/** RESULT:1446 **/

/** Add the two results to see how they compare to the total **/
SELECT 7545+1446
/** RESULT:8991 **/

您会注意到不同电子邮件的总数是8958，但是如果您添加维多利亚州的电子邮件和那些不在维多利亚州的电子邮件，则数字是8991，这是不同的。我错误地使用了count distinct函数吗？

Answer 1

where子句中的created_datetime不同。在第一个查询中它是

WHERE status = 1 
AND (created_datetime <= '2013-01-10 23:59:59' OR created_datetime IS NULL)

对于其他两个查询

WHERE status = 1 
AND (created_datetime <= '2013-01-31 00:00:00' OR created_datetime IS NULL)
AND [state] <> 'vic'

拉吉

Answer 2

除了@Raj和@MarkD提供的答案之外，我还要添加另一个观察结果不应该

OR created_datetime IS NULL

只在其中一个陈述中而不是两个陈述中？如果它同时存在，则会出现重复，“总”查询的结果将永远不会与各个查询的总和相匹配。

Answer 3

您正在计算不同电子邮件。如果来自维多利亚的用户的电子邮件与来自其他地方的用户的电子邮件相同，那么这些用户在总计数中将计为1。

当分别计算维多利亚和非维多利亚的电子邮件时，两种情况在每种情况下都会再次计为1，总计为2（如果你敢加上它们），这就是你现在的差异。

Answer 4

[State]的余额可能是NULL 正如Raj指出的那样，您的查询中的DATETIME会有所不同。

SELECT count(distinct(email)) Total FROM [member] WHERE status = 1 
AND (created_datetime <= '2013-01-31 00:00:00' OR created_datetime IS NULL)
AND [state] IS NULL

SQL查询中带有“count（distinct（field））”语句的差异

4 个答案: