这是我的输入数据
GroupId Serial Action
1 1 Start
1 2 Run
1 3 Jump
1 8 End
2 9 Shop
2 10 Start
2 11 Run
对于组中的每个活动序列,我想查找动作对,其中Action1.SerialNo = Action2.SerialNo + k以及它可能发生的次数
Suppose k = 1, then output will be
FirstAction NextAction Frequency
Start Run 2
Run Jump 1
Shop Start 1
如果输入表包含数百万个条目,我怎样才能在SQL中快速完成此任务。
答案 0 :(得分:1)
tful,这应该产生你想要的结果,但我不知道它是否会像你想的那样快。值得一试。
create table Actions(
GroupId int,
Serial int,
"Action" varchar(20) not null,
primary key (GroupId, Serial)
);
insert into Actions values
(1,1,'Start'), (1,2,'Run'), (1,3,'Jump'),
(1,8,'End'), (2,9,'Shop'), (2,10,'Start'),
(2,11,'Run');
go
declare @k int = 1;
with ActionsDoubled(Serial,Tag,"Action") as (
select
Serial, 'a', "Action"
from Actions as A
union all
select
Serial-@k, 'b', "Action"
from Actions
as B
), Pivoted(Serial,a,b) as (
select Serial,a,b
from ActionsDoubled
pivot (
max("Action") for Tag in ([a],[b])
) as P
)
select
a, b, count(*) as ct
from Pivoted
where a is not NULL and b is not NULL
group by a,b
order by a,b;
go
drop table Actions;
如果要对稳定数据上的各种@k值进行相同的计算,从长远来看,这可能会更好:
declare @k int = 1;
select
Serial, 'a' as Tag, "Action"
into ActionsDoubled
from Actions as A
union all
select
Serial-@k, 'b', "Action"
from Actions
as B;
go
create unique clustered index AD_S on ActionsDoubled(Serial,Tag);
create index AD_a on ActionsDoubled(Tag,Serial);
go
with Pivoted(Serial,a,b) as (
select Serial,a,b
from ActionsDoubled
pivot (
max("Action") for Tag in ([a],[b])
) as P
)
select
a, b, count(*) as ct
from Pivoted
where a is not NULL and b is not NULL
group by a,b
order by a,b;
go
drop table ActionsDoubled;
答案 1 :(得分:0)
SELECT a1.Action AS FirstActio, a2.Action AS NextAction, COUNT(*) AS Frequency
FROM Activities a1 JOIN Activities a2
ON (a1.GroupId = a2.GroupId AND a1.Serial = a2.Serial + @k)
GROUP BY a1.Action, a2.Action;
答案 2 :(得分:0)
问题在于:无论如何,您的查询都必须遍历每一行。
通过将每个组作为单独的查询单独处理,可以使数据库更易于管理。特别是如果每组的大小都是小的。
在幕后有很多事情发生,当查询必须扫描整个表格时,实际上最终会比你有效覆盖所有百万行的小块块慢很多倍。
例如:
--Stickler for clean formatting...
SELECT
a1.Action AS FirstAction,
a2.Action AS NextAction,
COUNT(*) AS Frequency
FROM
Activities a1 JOIN Activities a2
ON (a1.groupid = a2.groupid
AND a1.Serial = a2.Serial + @k)
WHERE
a1.groupid = 1
GROUP BY
a1.Action,
a2.Action;
顺便说一下,表上有一个索引(GroupId,Serial),对吧?