我正在编写一个网站用户行为分析工具作为业余爱好项目。它会跟踪用户链接点击次数以及他们最终来自这些链接的页面。它区分用户会话与点击内的唯一UIN标识符。
我正在写一个里程碑并点击数据报告,但查询速度非常慢。我还没有找到一种方法来提高性能,以便它运行得相当快(5秒以下的执行时间),所以如果有人能帮助我,我会非常感激。
下面的查询部分非常快。运行时间接近0.05秒:
declare @startDate date = '2013-01-01'
declare @endDate date = '2016-01-14'
declare @user int = 4
declare @country int = 224
select
p.PageId,
p.Name,
-- count of successful page landings
SUM(CASE WHEN m.MileStoneTypeId = 1 AND m.UserId = @user
THEN 1
ELSE 0
END) AS [Successful landings],
-- count of failed page landings
SUM(CASE WHEN m.MileStoneTypeId = 2 AND m.UserId = @user
THEN 1
ELSE 0
END) AS [Failed landings],
-- count of unfinished page landings
SUM(CASE WHEN m.MileStoneTypeId = 3 AND m.UserId = @user
THEN 1
ELSE 0
END) AS [Unfinished landings],
from
Page as p
inner join
Milestone as m
ON p.PageId = m.CampaignId
AND m.UserId = @user
AND m.Created >= @startDate
AND m.Created < @endDate
where
p.PageCountryId = @country
group by
p.PageId,
p.PageName
这是完整的查询,执行非常缓慢。运行时间在45-60秒之间。不同之处在于我正在尝试收集针对特定页面里程碑生成的点击次数:
declare @startDate date = '2013-01-01'
declare @endDate date = '2016-01-14'
declare @user int = 4
declare @country int = 224
select
p.PageId,
p.Name,
-- Unique clicks
(SELECT
COUNT(DISTINCT click.UIN)
FROM
Click as click
WHERE
click.PageId = p.PageId AND
click.Created >= @startDate AND
click.Created < @endDate AND
click.UserId = @user
) as [Unique clicks],
-- Total clicks
(SELECT
COUNT(click.UIN)
FROM
Click as click
WHERE
click.PageId = p.PageId AND
click.Created >= @startDate AND
click.Created < @endDate AND
click.User = @user
) as [Total clicks],
-- count of successful page landings
SUM(CASE WHEN m.MileStoneTypeId = 1 AND m.UserId = @user
THEN 1
ELSE 0
END) AS [Successful landings],
-- count of failed page landings
SUM(CASE WHEN m.MileStoneTypeId = 2 AND m.UserId = @user
THEN 1
ELSE 0
END) AS [Failed landings],
-- count of unfinished page landings
SUM(CASE WHEN m.MileStoneTypeId = 3 AND m.UserId = @user
THEN 1
ELSE 0
END) AS [Unfinished landings],
from
Page as p
inner join
Milestone as m
ON p.PageId = m.CampaignId
AND m.UserId = @user
AND m.Created >= @startDate
AND m.Created < @endDate
where
p.PageCountryId = @country
group by
p.PageId,
p.PageName
执行单击计数查询作为独立查询的速度相当快。每个(DISTINCT和非不同)查询的运行时间接近1秒。
这是一个独立的查询“快”:
-- Unique clicks
(SELECT
COUNT(DISTINCT click.UIN)
FROM
Click as click
WHERE
click.PageId = p.PageId AND
click.Created >= @startDate AND
click.Created < @endDate AND
click.UserId = @user
) as [Unique clicks],
这也是一个独立查询的“快速”:
-- Total clicks
(SELECT
COUNT(click.UIN)
FROM
Click as click
WHERE
click.PageId = p.PageId AND
click.Created >= @startDate AND
click.Created < @endDate AND
click.User = @user
) as [Total clicks],
当我尝试将所有内容组合在一个大型查询中时,会出现问题。由于某些原因,独立查询运行速度非常快,但组合查询执行时间非常慢。
带有点击的表格中有一列“UIN”,为每个用户到达网站时分配。当他们单击链接时,会在具有用户ID和UIN的Click -table中插入一行。 UIN区分用户会话,因此具有UIN abcdef123的UserId 4可以具有多个相同的行。此UIN用于计算用户会话中的唯一点击次数和总点击次数。
Page表格大约有1000行。 Milestone表有大约200 000行,Click表有大约10 000 000行。
知道如何通过包含唯一和总点击次数来提高完整查询的效果吗?
这是表格内容和目标输出
来自Page table的数据
+--------+-----------------------+-----------+
| PageId | Name | CountryId |
+--------+-----------------------+-----------+
| 3095 | Registration | 77 |
| 3110 | Customer registration | 77 |
| 5174 | View user details | 77 |
+--------+-----------------------+-----------+
用户表格中的数据
+--------+------+
| UserId | Name |
+--------+------+
| 1 | Dan |
| 2 | Mike |
| 3 | John |
+--------+------+
来自点击次数表的数据
+---------+--------------------------------------+--------+-------------------------+--------+
| ClickId | Uin | UserId | Created | PageId |
+---------+--------------------------------------+--------+-------------------------+--------+
| 1296600 | B420D0F4-20BE-49BE-AAC9-47DD858B68DD | 4301 | 2016-01-14 12:08:03:723 | 8603 |
| 1296599 | DA5877BA-8FF5-4671-8DF9-CCCBF1555BA1 | 4418 | 2016-01-14 12:07:46:930 | 2009 |
| 1296598 | C6790CB9-6DA6-4A8B-84AA-7D2D3A4B5787 | 4276 | 2016-01-14 12:07:43:563 | 8678 |
+---------+--------------------------------------+--------+-------------------------+--------+
来自里程碑表的数据
+-------------+-----------------+------------+--------+-------------------------+--------+
| MilestoneId | MilestoneTypeId | CampaignId | UserId | Created | PageId |
+-------------+-----------------+------------+--------+-------------------------+--------+
| 1 | 1 | 1001 | 4 | 2014-02-06 13:18:04:487 | 52 |
| 2 | 1 | 1001 | 4 | 2014-02-06 13:41:01:257 | 9642 |
| 3 | 1 | 1001 | 4 | 2014-02-07 09:52:29:373 | 2393 |
+-------------+-----------------+------------+--------+-------------------------+--------+
以下是我想要实现的输出数据:
+---------+-----------------------+---------------+--------------+----------------------+-----------------+---------------------+
| Page Id | Page Name | Unique clicks | Total clicks | Successfull Landings | Failed Landings | Unfinished Landings |
+---------+-----------------------+---------------+--------------+----------------------+-----------------+---------------------+
| 3095 | Registration | 102 | 116 | 2 | 0 | 0 |
| 3110 | Customer registration | 3 | 6 | 1 | 1 | 0 |
| 5174 | View user details | 13 | 13 | 0 | 1 | 0 |
| 5178 | Edit content page | 11 | 11 | 1 | 0 | 0 |
| 6217 | Add new vehicle | 18 | 18 | 2 | 0 | 0 |
+---------+-----------------------+---------------+--------------+----------------------+-----------------+---------------------+
答案 0 :(得分:1)
这很慢,因为你制作了&#34;点击&#34;为查询中的每一行选择两次。
尝试像使用里程碑表一样加入它并添加group by user
子句。
UPD。 拜托,您可以在下一个例子中提供表格结构和数据吗?
declare @Page as table (
PageId int,
etc
)
insert into @page (PageId, etc) values (3095, etc)
答案 1 :(得分:1)
点击流数据可能很难处理,通常是由于生成的记录量。但在这种情况下,我认为问题是由于在SELECT子句中使用correlated subqueries。如果你不熟悉;相关子查询是引用外部查询的任何子查询。这些损害性能是因为SQL引擎被强制为返回的每一行评估一次查询。这破坏了基于set的SQL特性。
我对您的示例数据进行了一些更改。提供后,我无法返回任何记录来验证我的结果集。我已在连接字段中更新了值以解决此问题:
示例数据
DECLARE @Page TABLE
(
PageId INT,
Name VARCHAR(50),
CountryId INT
)
;
DECLARE @User TABLE
(
UserId INT,
Name VARCHAR(50)
)
;
DECLARE @Clicks TABLE
(
ClickId INT,
Uin UNIQUEIDENTIFIER,
UserId INT,
Created DATETIME,
PageId INT
)
;
DECLARE @Milestone TABLE
(
MiestoneId INT,
MilestoneTypeId INT,
CampaignId INT,
UserId INT,
Created DATETIME,
PageId INT
)
;
INSERT INTO @Page
(
PageId,
Name,
CountryId
)
VALUES
(3095, 'Registration', 77),
(3110, 'Customer registration', 77),
(5174, 'View user details', 77)
;
INSERT INTO @User
(
UserId,
Name
)
VALUES
(4301, 'Dan'),
(2, 'Mike'),
(3, 'John')
;
INSERT INTO @Clicks
(
ClickId,
Uin,
UserId,
Created,
PageId
)
VALUES
(1296600, 'B420D0F4-20BE-49BE-AAC9-47DD858B68DD', 4301, '2016-01-14 12:08:03:723', 3095),
(1296600, 'B420D0F4-20BE-49BE-AAC9-47DD858B68DD', 4301, '2016-01-14 12:08:03:723', 3095),
(1296599, 'DA5877BA-8FF5-4671-8DF9-CCCBF1555BA1', 4301, '2016-01-14 12:07:46:930', 3110),
(1296598, 'C6790CB9-6DA6-4A8B-84AA-7D2D3A4B5787', 4301, '2016-01-14 12:07:43:563', 5174)
;
INSERT INTO @Milestone
(
MiestoneId,
MilestoneTypeId,
CampaignId,
UserId,
Created,
PageId
)
VALUES
(1, 1, 1001, 4301, '2014-01-06 13:18:04:487', 3095),
(2, 1, 1001, 4301, '2014-01-06 13:41:01:257', 3110),
(3, 3, 1001, 4301, '2014-01-07 09:52:29:373', 5174)
;
正如您在原始查询中发现的那样,您无法直接将Milestone加入Click,因为每个表都有不同的粒度。在我的查询中,我使用CTEs返回每个表中的总计。我的查询的主体加入了结果。
示例强>
DECLARE @StartDate date = '2013-01-01';
DECLARE @EndDate date = '2016-01-15';
DECLARE @UserId int = 4301;
DECLARE @CountryId int = 77;
WITH Click AS
(
SELECT
UserId,
PageId,
COUNT(DISTINCT Uin) AS [Distinct Clicks],
COUNT(ClickId) AS [Total Clicks]
FROM
@Clicks
WHERE
UserId = @UserId
AND Created BETWEEN @StartDate AND @EndDate
GROUP BY
UserId,
PageId
),
Milestone AS
(
SELECT
UserId,
PageId,
SUM(CASE WHEN MileStoneTypeId = 1 THEN 1 ELSE 0 END) AS [Successful Landings],
SUM(CASE WHEN MileStoneTypeId = 2 THEN 1 ELSE 0 END) AS [Failed Landings],
SUM(CASE WHEN MileStoneTypeId = 3 THEN 1 ELSE 0 END) AS [Unfinished Landings]
FROM
@Milestone
WHERE
UserId = @UserId
AND Created BETWEEN @StartDate AND @EndDate
GROUP BY
UserId,
PageId
)
SELECT
p.PageId,
p.Name,
c.[Distinct Clicks],
c.[Total Clicks],
ms.[Successful Landings],
ms.[Failed Landings],
ms.[Unfinished Landings]
FROM
@Page AS p
INNER JOIN Click AS c ON c.PageId = p.PageId
INNER JOIN Milestone AS ms ON ms.PageId = c.PageId
AND ms.UserId = c.UserId
WHERE
p.CountryId = @CountryId
;
答案 2 :(得分:0)
select count(1) from (select distinct column from table);
如果您想查看最多的费用,可以使用以下模式
set showplan_all on
检查查询的说明 或者您只需单击Microsoft SQL Server Management Studio中的显示估计执行计划
希望这可以帮助你:)
答案 3 :(得分:0)
您应该将“点击”转换为函数并通过查询调用这些函数。使用“点击”作为subquerys会慢慢运行,因为它会为每一行运行很多次。