sas - 如何将当前行与previous匹配,然后在另一列中分配值

时间:2016-11-21 12:03:34

标签: sql sas

我所拥有的是以下数据:

data haveb1;
infile cards truncover expandtabs;
input MC $ ET $ Date : date9. Time : hhmmss5. TPMC $ PXMC $ Site $ Dia MV;
format date date9. time hhmm5.;
cards;
US0001 CRE 29MAY13 0:00 7611 HTELI1124P 1 . 2734440.00000
US0001 CRE 31JAN14 0:00 7402 HTELI1015P 2 . 2735017.00000
US000323 Removal 31OCT12 0:00 7416 HTELI1079P 3 . 1346049.00000
US000323 Inst 11JAN13 0:00 7408 HTELI1034P 3 . 1346049.00000
US000323 Removal 24MAY14 0:00 7408 HTELI1034P 3 . 1812537.00000
US000328 CRE 03FEB13 0:00 7209 HTELI1115P 3 . 2040610.00000
US000328 CRE 18JUL14 0:00 7218 HTELI1152P 3 . 2134438.00000
US000328 Inst 15FEB15 0:00 7508 HTELI1098P 3 . 2180863.00000
US000328 CRE 21MAY15 0:00 7212 HTELI1098P 3 . 2232830.00000
US000328 CRE 01OCT15 0:00 7111 HTELI1215P 2 . 2232830.00000
US000329 Removal 21MAR14 0:00 7110 HTELI1148P 2 . 2130325.00000
US000329 CRE 18SEP14 0:00 7517 HTELI1211P 3 . 2130325.00000
US000331 CRE 02SEP13 0:00 7207 HTELI020 2 . 2059478.00000
US000331 Removal 17JUN15 0:00 7207 HTELI020 2 . 2689105.00000
US000331 Inst 19APR16 0:00 7114 HTELI1147P 3 . 2689105.00000
US000334 Inst 26JUN13 0:00 7512 HTELI1023P 2 . 2535592.00000
US000334 CRE 04JUL14 0:00 7217 HTELI1145P 2 . 2815903.00000
;
run;

我想要做的就是“计算”' MC从一个TPMC更改为另一个TPMC的次数。所以最终输出应该如下:

MC  ET  Date    Time    TPMC    Change  PXMC    Site    Dia MV
US0001  CRE 29May2013   0:00    7611    0   HTELI112    1       2734440
US0001  CRE 31Jan2014   0:00    7402    1   HTELI101    2       2735017
US000323    Removal 31Oct2012   0:00    7416    0   HTELI107    3       1346049
US000323    Inst    11Jan2013   0:00    7408    1   HTELI103    3       1346049
US000323    Removal 24May2014   0:00    7408    0   HTELI103    3       1812537
US000328    CRE 03Feb2013   0:00    7209    1   HTELI111    3       2040610
US000328    CRE 18Jul2014   0:00    7218    1   HTELI115    3       2134438
US000328    Inst    15Feb2015   0:00    7508    1   HTELI109    3       2180863
US000328    CRE 21May2015   0:00    7212    1   HTELI109    3       2232830
US000328    CRE 01Oct2015   0:00    7111    1   HTELI121    2       2232830
US000329    Removal 21Mar2014   0:00    7110    0   HTELI114    2       2130325
US000329    CRE 18Sep2014   0:00    7517    1   HTELI121    3       2130325
US000331    CRE 02Sep2013   0:00    7207    0   HTELI020    2       2059478
US000331    Removal 17Jun2015   0:00    7207    0   HTELI020    2       2689105
US000331    Inst    19Apr2016   0:00    7114    1   HTELI114    3       2689105
US000334    Inst    26Jun2013   0:00    7512    0   HTELI102    2       2535592
US000334    CRE 04Jul2014   0:00    7217    1   HTELI114    2       2815903

这里发生的事情基本上就是第一行“改变”。 column始终为0,然后如果当前行中的TPMC与上一行中的TPMC不同,则会在' Change'列,否则显示0。

怎么办?

当我按照 Chris J (在回答中)写的那样运行代码时,我得到了以下结果,不幸的是,这不符合要求:

proc sort data=haveb1 ;
  by MC Date Time ;
run ;

data want ;
  set haveb1 ;
  by MC Date Time TPMC notsorted ;
  if first.MC then Change = 0 ;
  else
  if first.TPMC then Change + 1 ;
run ;

结果:

US0001  Lath    02JAN13 19:24               876 2660403.00000   1   0
US0001  CRE 29MAY13 0:00    7611    HTELI1124P  1   .   2734440.00000   1   0
US0001  CRE 31JAN14 0:00    7402    HTELI1015P  2   .   2735017.00000   1   0
US0001  Lath    12JAN15 7:00                .   2900334.00000   1   0
US000323    Lath    13OCT12 19:37               852.2   1332753.00000   1   0
US000323    WI  25OCT12 0:00                .   1342148.00000   1   0
US000323    Remov   31OCT12 0:00    7416    HTELI1079P  3   .   1346049.00000   1   0
US000323    Lath    31OCT12 14:03               890.5   1346049.00000   1   0
US000323    Installation    11JAN13 0:00    7408    HTELI1034P  3   .   1346049.00000   1   0
US000323    Lath    16.marras.13    19:52               888.7   1417443.00000   1   0
US000323    Lath    12OCT13 13:49               886.7   1606899.00000   1   0
US000323    Lath    12OCT13 14:17               886.7   1606899.00000   1   0
US000323    Remov   24MAY14 0:00    7408    HTELI1034P  3   .   1812537.00000   1   0
US000328    Meas    17OCT12 16:11               .   1941116.00000   .   0
US000328    Meas    17OCT12 16:11               852.2   1941116.00000   .   1
US000328    Meas    18OCT12 10:53               849.8   1943064.00000   .   0
US000328    Meas    18OCT12 10:53               849.8   1942090.00000   .   1
US000328    Meas    18OCT12 10:53               852.1   1943064.00000   .   2
US000328    Meas    18OCT12 10:53               852.1   1942090.00000   .   3
US000328    Meas    20OCT12 10:17               849.7   1944562.00000   .   0
US000328    Meas    20OCT12 10:17               851.9   1944562.00000   .   1

2 个答案:

答案 0 :(得分:1)

使用first.notsorted选项:

proc sort data=haveb1 ;
  by MC Date Time ;
run ;

data want ;
  set haveb1 ;
  by MC Date Time TPMC notsorted ;
  if first.MC then Change = 0 ;
  else
  if first.TPMC then Change + 1 ;
run ;

答案 1 :(得分:1)

考虑以下两个proc sql运行第一个初始data步骤(也可以在导入数据期间处理)。如果SAS允许像RDMS这样的CTE,这可以在一次运行中处理。当然,您仍然可以将第一个查询嵌入到第二个查询中的每个 want1 proc sql

步骤

  1. 在此之前,使用行号_N_(在大多数SQL引擎数据库表中惯用)添加主 ID 列,用于相同日期记录的断路器:

    data haveb1;
        set haveb1;    
        id = _N_;       
    run;
    
  2. 第一个SQL使用相关聚合子查询为每个 MC 组按顺序 Date 返回计数,条件为 TMPC 维护非零长度值。请注意:TMPC必须作为下面的工作角色。

  3. 第二个SQL将新更改列更正为在这些绑定值之后增加1。 CASE逻辑语句用于清除缺少的 TPMC 值。
  4. SAS 代码

    proc sql;
        CREATE TABLE want1 AS
    
        SELECT h.MC, h.ET, h.Date, h.Time, 
    
             (SELECT Count(*) FROM haveb1 sub
              WHERE sub.MC = h.MC AND sub.TPMC ne h.TPMC AND LENGTH(sub.TPMC) > 1
              AND (sub.Date < h.Date OR sub.Date = h.Date AND sub.id < h.id )) AS Change, 
    
             h.TPMC, h.PXMC, h.Site, h.Dia, h.MV
        FROM haveb1 h;
    quit;
    
    proc sql;
        CREATE TABLE want2 AS
    
        SELECT w.MC, w.ET, w.Date, w.Time,  
             CASE WHEN LENGTH(w.TPMC) > 1 
                  THEN (SELECT Count(*)
                        FROM (SELECT DISTINCT t.MC, t.Change FROM want1 t) sub
                        WHERE sub.MC = w.MC AND sub.Change < w.Change)  
                  ELSE 0 
             END AS Change,     
    
             w.TPMC, w.PXMC, w.Site, w.Dia, w.MV
        FROM want1 w;
    quit;
    

    SAS Proc SQL Output