我正在尝试分析一些棒球统计数据,我在实现看起来应该是一项简单的任务时遇到了一些麻烦。看一下以下结果集:
GAME_PK REC_SEQ BatterId PlayNumber EventType
287576 6 462101 1 single
287576 14 519048 2 single
287576 25 435079 3 strikeout
287576 26 435079 4 stolen_base_home
287576 28 435079 5 stolen_base_2b
我使用ROW_NUMBER()OVER(ORDER BY GAME_PK,REC_SEQ)生成PlayNumber列。其余的直接来自MLB统计数据库。 REC_SEQ是游戏中事件的序列号。 EventType本质上是at-bat的结果。
我希望PlayNumber仅在BatterId更改时增加。但它必须尊重REC_SEQ的顺序。所以我认为我不能使用RANK或DENSE_RANK,但这些似乎非常接近我的需要。
我希望我的结果集看起来像这样:
GAME_PK REC_SEQ BatterId PlayNumber EventType
287576 6 462101 1 single
287576 14 519048 2 single
287576 25 435079 3 strikeout
287576 26 435079 3 stolen_base_home
287576 28 435079 3 stolen_base_2b
感谢任何帮助。
谢谢!
编辑:击球手在比赛中可以出现不止一次。应为每个外观分配一个新的PlayNumber。基本上,每个新的击球都需要一个新的PlayNumber。
答案 0 :(得分:1)
编辑:似乎可以实现的唯一方法是通过确定哪些连续记录共享batterId来确定每个组的开始和结束位置。这是通过将记录与自身相连,以1 rownum偏移来确定每个组的开始位置。一旦我们收集了每个组的开头(GroupSets
),我们就可以确定每个单独记录所属的组产生正确的编号:
with GroupSets as (
select
row_number() over (order by s1.rec_seq) as rownum,
s1.game_pk, s1.rec_seq, s1.batterid, s2.batterid as nextbatterid,
s1.eventtype
from (select *, row_number() over (order by rec_seq) as rownum from stats) s1
left join (select rec_seq, batterid,
row_number() over (order by rec_seq) as rownum from stats) s2
on s1.rownum = s2.rownum + 1
where s1.batterid != s2.batterid or s2.batterid is null
)
select
game_pk,
rec_seq,
batterid,
(select max(rownum) from GroupSets gs where gs.Rec_Seq <= s1.rec_seq) as PlayNumber,
eventtype
from
stats s1;
演示:http://www.sqlfiddle.com/#!3/a5e68/50
不处理交错的旧代码:
实际上DENSE_RANK()
函数应该这样做。但是,我们需要对每个BatterId组MIN(REC_SEQ)
的值进行排名,以便使用REC_SEQ
来控制订单。这样的事情应该这样做:
select
s1.game_pk,
s1.rec_seq,
s1.batterID,
dense_rank() over (order by s2.rec_seq) as PlayNumber,
s1.EventType
from
stats s1
join
(select batterid, min(rec_seq) rec_seq
from stats group by batterid) s2 on s1.batterid = s2.batterid
order by
rec_seq
答案 1 :(得分:0)
这很难,但可以在SQL Server中使用。我会注意到Oracle的分析功能使这更容易。
这个想法如下:
我认为以下代码可以解决问题:
with s_enum as
(
select s.*, ROW_NUMBER() over (partition by game_pk order by rec_seq) as Seq
from stats
) s_cp as
(
select s.*, ROW_NUMBER() over (partition by game_pk, FirstInSeq) as BattingSeq
from
(
select s.*,
(case when prev.BatterId = curr.BatterId then 1 else 0 end) as FirstInSeq
from s_enum curr
left outer join s_enum prev
on curr.game_pk = prev.game_pk
and curr.Seq = prev.Seq + 1
)
)
select s.game_pk, s.batterid, s.rec_seq, MAX(bs.req_sec) as PlayNumber
from stats s
join
(
select s.*
from s_cp s
where FirstInSeq = 1
) bs
on s.game_pk = bs.game_pk
and s.batterid = bs.batterid
and s.rec_seq >= bs.req_sec
group by s.game_pk, s.batterid, s.rec_seq