SQL-将以前的值从一列传播到多个新列

时间:2019-01-29 13:09:37

标签: sql r sql-server-2012 dplyr pivot

我有一个Customer_ID的SQL表,显示按年份付款。第一个(许多)客户显示如下:

 ID    Payment    Year
112          0    2004
112          0    2005
112          0    2006
112       9592    2007
112      12332    2008
112       9234    2011
112       5400    2012
112       7392    2014
112       8321    2015

请注意,有些年份不见了。我需要为每行创建10个新列,以显示前10年的付款。结果表应如下所示:

 ID    Payment    Year   T-1  T-2  T-3  T-4  T-5  T-6  T-7  T-8  T-9 T-10   
112          0    2004  NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
112          0    2005     0 NULL NULL NULL NULL NULL NULL NULL NULL NULL
112          0    2006     0    0 NULL NULL NULL NULL NULL NULL NULL NULL
112        952    2007     0    0    0 NULL NULL NULL NULL NULL NULL NULL
112       1232    2008   952    0    0    0 NULL NULL NULL NULL NULL NULL
112        924    2011  NULL NULL 1232  952    0    0    0 NULL NULL NULL 
112        500    2012   924 NULL NULL 1232  952    0    0    0 NULL NULL 
112        392    2014  NULL  500  924 NULL NULL 1232  952    0    0    0
112        821    2015   392 NULL  500  924 NULL NULL 1232  952    0    0

我很清楚这是数据的大量重复,因此似乎很奇怪。但是,我仍然想这样做! (数据正在为预测模型准备,在该模型中,以前的付款(和其他信息)将用于预测当年的付款)

我不太确定从哪里开始。我一直在寻找使用数据透视的方法,但无法弄清楚如何从客户的前一年中选择价值。

我非常想在SQL中执行此操作。如果那不可能,我可以将表复制到R中-但我偏爱使用SQL。

非常感谢任何帮助。

2 个答案:

答案 0 :(得分:1)

如果您有完整的数据,则可以使用lag()

select t.*,
       lag(payment, 1) over (partition by id order by year) as t_1,
       lag(payment, 2) over (partition by id order by year) as t_2,
       . . .
from t;

但是,对于缺少中间年份的情况,left join可能会更简单:

select t.*,
       t1.payment as t_1,
       t2.payment as t_2,
       . . .
from t left join
     t t1
     on t1.id = t.id and
        t1.year = t.year - 1 left join
     t t2
     on t1.id = t.id and
        t1.year = t.year - 2 left join
     . . .;

答案 1 :(得分:0)

我想你的朋友将成为LAG

这是一个实现:

Declare @t table (
    ID int,
    Payment int,
    Yr int
)
Insert Into @t Values(112,0,2004)
Insert Into @t Values(112,0,2005)
Insert Into @t Values(112,0,2006)
Insert Into @t Values(112,9592,2007)
Insert Into @t Values(112,12332,2008)
Insert Into @t Values(112,9234,2011)
Insert Into @t Values(112,5400,2012)
Insert Into @t Values(112,7392,2014)
Insert Into @t Values(112,8321,2015)

Insert Into @t Values(113,0,2009)
Insert Into @t Values(113,9234,2011)
Insert Into @t Values(113,5400,2013)
Insert Into @t Values(113,8321,2015)


;with E1(n) as (Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1)
,E2(n) as (Select 1 From E1 a, E1 b)
,E4(n) as (Select 1 From E2 a, E2 b)
,E5(n) as (Select row_number() over(order by isnull(null,1)) From E4 a, E1 b)
,IDYears as (
    Select z.ID, Yr = y.n
    From (
        Select 
            Id, 
            MinYear = min(Yr),
            MaxYear = max(Yr)
        From @t a
        Group By Id
        ) z
    Inner Join E5 y On y.n between z.MinYear and z.MaxYear
)

Select 
    *,
    [t-1] = Lag(B.Payment, 1) Over(Partition By a.ID Order By a.Yr),
    [t-2] = Lag(B.Payment, 2) Over(Partition By a.ID Order By a.Yr),
    [t-3] = Lag(B.Payment, 3) Over(Partition By a.ID Order By a.Yr),
    [t-4] = Lag(B.Payment, 4) Over(Partition By a.ID Order By a.Yr),
    [t-5] = Lag(B.Payment, 5) Over(Partition By a.ID Order By a.Yr),
    [t-6] = Lag(B.Payment, 6) Over(Partition By a.ID Order By a.Yr),
    [t-7] = Lag(B.Payment, 7) Over(Partition By a.ID Order By a.Yr),
    [t-8] = Lag(B.Payment, 8) Over(Partition By a.ID Order By a.Yr),
    [t-9] = Lag(B.Payment, 9) Over(Partition By a.ID Order By a.Yr),
    [t-10] = Lag(B.Payment, 10) Over(Partition By a.ID Order By a.Yr)
From IDYears a
Left Join @t b On a.ID = b.ID and a.Yr = b.Yr
Order By A.ID