SQL子查询在今天之前不存在的地方

时间:2014-05-30 14:18:14

标签: sql tsql count grouping

我有一个不断更新的IP地址数据库。我需要知道今天添加了多少独特的IP地址,但今天还没有添加。

我试图用子查询做到这一点 - 但是现在查询会花费几分钟来处理数百万条记录。

这是原始查询:

SELECT 
   visitDate,
   COUNT(*) AS TotalDistinctIPs,
   SUM(CASE WHEN [type] = 'C' THEN 1 ELSE 0 END) as CustomerIPs
   -- SUM(CustomerIPs Where does not exist in prior day!)
FROM (
    SELECT DISTINCT
        V.ip,
        CAST(V.visitdate AS DATE) AS visitdate,
        [type]
        FROM IP_Addresses V
    ) x
GROUP BY
    visitdate
ORDER BY visitdate DESC

以下是我注释的一些示例数据,用于说明我在这里要做的事情:

CREATE TABLE [dbo].[IP_Addresses](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [IP] [varchar](20) NOT NULL,
    [type] [char](1) NOT NULL,
    [visitDate] [datetime] NULL,
 CONSTRAINT [PK_IP_Addresses] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
SET IDENTITY_INSERT [dbo].[IP_Addresses] ON
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (1, N'192.168.0.1', N'C', CAST(0x0000A33B00A63920 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (2, N'192.168.0.2', N'C', CAST(0x0000A33B00C72EA0 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (3, N'192.168.0.4', N'P', CAST(0x0000A33A011C5F38 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (4, N'192.168.0.5', N'C', CAST(0x0000A33A00C72EA0 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (5, N'192.168.0.6', N'C', CAST(0x0000A33900F89EE0 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (6, N'192.168.0.7', N'C', CAST(0x0000A33800A63920 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (7, N'192.168.0.8', N'P', CAST(0x0000A33700875D84 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (8, N'192.168.0.1', N'C', CAST(0x0000A3360089CA9C AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (9, N'192.168.0.5', N'C', CAST(0x0000A336006660FC AS DateTime))
SET IDENTITY_INSERT [dbo].[IP_Addresses] OFF

目标是添加新客户IP列。此列必须包含前一天未访问的唯一IP地址计数。

使用此数据集作为示例 - 所需的输出如下所示:

visitDate TotalDistinctIPs CustomerIPs New Customer IPs
2014-05-30         2        2              1
2014-05-29         2        1              0
2014-05-28         1        1              1
2014-05-27         1        1              1
2014-05-26         1        0              0
2014-05-25         2        2              2

排:1(visitDate 2014-05-30) id = 1:不是新客户(参见id = 8) id = 2:是新客户(前几天不存在)

行:2(visitDate 2014-05-29) id = 3:不是客户(类型是' P') id = 4:不是新客户(之前的访问请参见id = 9)

行:3(visitDate 2014-05-28) id = 5:是新客户

依旧......

先谢谢dbo.Genius解决了这个问题!

2 个答案:

答案 0 :(得分:1)

哦,你不是指"今天",你的意思是在过去的任何特定日期。您可以通过在子查询中按顺序编号来完成此操作。请注意,以下内容将子查询从select distinct更改为group by,然后在外部查询中使用条件聚合:

SELECT V.visitdate AS DATE), COUNT(*) AS TotalDistinctIPs,
       SUM(CASE WHEN [type] = 'C' THEN 1 ELSE 0 END) as CustomerIPs,
       SUM(CASE WHEN seqnum = 1 THEN 1 ELSE 0 END) as FirstVisits
FROM (SELECT ip, [type], CAST(V.visitdate AS DATE) as VisitDate,
             row_number() over (partition by ip order by CAST(V.visitdate AS DATE)) as seqnum
      FROM IP_Addresses V
      GROUP BY ip, [type], CAST(V.visitdate AS DATE)
     ) v
GROUP BY V.visitdate
ORDER BY V.visitdate DESC;

答案 1 :(得分:0)

有时将COUNT(*)更改为COUNT(1)会在查询中节省一些时间。你试过了吗?