我有一个不断更新的IP地址数据库。我需要知道今天添加了多少独特的IP地址,但今天还没有添加。
我试图用子查询做到这一点 - 但是现在查询会花费几分钟来处理数百万条记录。
这是原始查询:
SELECT
visitDate,
COUNT(*) AS TotalDistinctIPs,
SUM(CASE WHEN [type] = 'C' THEN 1 ELSE 0 END) as CustomerIPs
-- SUM(CustomerIPs Where does not exist in prior day!)
FROM (
SELECT DISTINCT
V.ip,
CAST(V.visitdate AS DATE) AS visitdate,
[type]
FROM IP_Addresses V
) x
GROUP BY
visitdate
ORDER BY visitdate DESC
以下是我注释的一些示例数据,用于说明我在这里要做的事情:
CREATE TABLE [dbo].[IP_Addresses](
[id] [int] IDENTITY(1,1) NOT NULL,
[IP] [varchar](20) NOT NULL,
[type] [char](1) NOT NULL,
[visitDate] [datetime] NULL,
CONSTRAINT [PK_IP_Addresses] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
SET IDENTITY_INSERT [dbo].[IP_Addresses] ON
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (1, N'192.168.0.1', N'C', CAST(0x0000A33B00A63920 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (2, N'192.168.0.2', N'C', CAST(0x0000A33B00C72EA0 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (3, N'192.168.0.4', N'P', CAST(0x0000A33A011C5F38 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (4, N'192.168.0.5', N'C', CAST(0x0000A33A00C72EA0 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (5, N'192.168.0.6', N'C', CAST(0x0000A33900F89EE0 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (6, N'192.168.0.7', N'C', CAST(0x0000A33800A63920 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (7, N'192.168.0.8', N'P', CAST(0x0000A33700875D84 AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (8, N'192.168.0.1', N'C', CAST(0x0000A3360089CA9C AS DateTime))
INSERT [dbo].[IP_Addresses] ([id], [IP], [type], [visitDate]) VALUES (9, N'192.168.0.5', N'C', CAST(0x0000A336006660FC AS DateTime))
SET IDENTITY_INSERT [dbo].[IP_Addresses] OFF
目标是添加新客户IP列。此列必须包含前一天未访问的唯一IP地址计数。
使用此数据集作为示例 - 所需的输出如下所示:
visitDate TotalDistinctIPs CustomerIPs New Customer IPs
2014-05-30 2 2 1
2014-05-29 2 1 0
2014-05-28 1 1 1
2014-05-27 1 1 1
2014-05-26 1 0 0
2014-05-25 2 2 2
排:1(visitDate 2014-05-30) id = 1:不是新客户(参见id = 8) id = 2:是新客户(前几天不存在)
行:2(visitDate 2014-05-29) id = 3:不是客户(类型是' P') id = 4:不是新客户(之前的访问请参见id = 9)
行:3(visitDate 2014-05-28) id = 5:是新客户
依旧......
先谢谢dbo.Genius解决了这个问题!
答案 0 :(得分:1)
哦,你不是指"今天",你的意思是在过去的任何特定日期。您可以通过在子查询中按顺序编号来完成此操作。请注意,以下内容将子查询从select distinct
更改为group by
,然后在外部查询中使用条件聚合:
SELECT V.visitdate AS DATE), COUNT(*) AS TotalDistinctIPs,
SUM(CASE WHEN [type] = 'C' THEN 1 ELSE 0 END) as CustomerIPs,
SUM(CASE WHEN seqnum = 1 THEN 1 ELSE 0 END) as FirstVisits
FROM (SELECT ip, [type], CAST(V.visitdate AS DATE) as VisitDate,
row_number() over (partition by ip order by CAST(V.visitdate AS DATE)) as seqnum
FROM IP_Addresses V
GROUP BY ip, [type], CAST(V.visitdate AS DATE)
) v
GROUP BY V.visitdate
ORDER BY V.visitdate DESC;
答案 1 :(得分:0)
有时将COUNT(*)更改为COUNT(1)会在查询中节省一些时间。你试过了吗?