我所拥有的是以下数据:
data haveb1;
infile cards truncover expandtabs;
input MC $ ET $ Date : date9. Time : hhmmss5. TPMC $ PXMC $ Site $ Dia MV;
format date date9. time hhmm5.;
cards;
US0001 CRE 29MAY13 0:00 7611 HTELI1124P 1 . 2734440.00000
US0001 CRE 31JAN14 0:00 7402 HTELI1015P 2 . 2735017.00000
US000323 Removal 31OCT12 0:00 7416 HTELI1079P 3 . 1346049.00000
US000323 Inst 11JAN13 0:00 7408 HTELI1034P 3 . 1346049.00000
US000323 Removal 24MAY14 0:00 7408 HTELI1034P 3 . 1812537.00000
US000328 CRE 03FEB13 0:00 7209 HTELI1115P 3 . 2040610.00000
US000328 CRE 18JUL14 0:00 7218 HTELI1152P 3 . 2134438.00000
US000328 Inst 15FEB15 0:00 7508 HTELI1098P 3 . 2180863.00000
US000328 CRE 21MAY15 0:00 7212 HTELI1098P 3 . 2232830.00000
US000328 CRE 01OCT15 0:00 7111 HTELI1215P 2 . 2232830.00000
US000329 Removal 21MAR14 0:00 7110 HTELI1148P 2 . 2130325.00000
US000329 CRE 18SEP14 0:00 7517 HTELI1211P 3 . 2130325.00000
US000331 CRE 02SEP13 0:00 7207 HTELI020 2 . 2059478.00000
US000331 Removal 17JUN15 0:00 7207 HTELI020 2 . 2689105.00000
US000331 Inst 19APR16 0:00 7114 HTELI1147P 3 . 2689105.00000
US000334 Inst 26JUN13 0:00 7512 HTELI1023P 2 . 2535592.00000
US000334 CRE 04JUL14 0:00 7217 HTELI1145P 2 . 2815903.00000
;
run;
我想要做的就是“计算”' MC从一个TPMC更改为另一个TPMC的次数。所以最终输出应该如下:
MC ET Date Time TPMC Change PXMC Site Dia MV
US0001 CRE 29May2013 0:00 7611 0 HTELI112 1 2734440
US0001 CRE 31Jan2014 0:00 7402 1 HTELI101 2 2735017
US000323 Removal 31Oct2012 0:00 7416 0 HTELI107 3 1346049
US000323 Inst 11Jan2013 0:00 7408 1 HTELI103 3 1346049
US000323 Removal 24May2014 0:00 7408 0 HTELI103 3 1812537
US000328 CRE 03Feb2013 0:00 7209 1 HTELI111 3 2040610
US000328 CRE 18Jul2014 0:00 7218 1 HTELI115 3 2134438
US000328 Inst 15Feb2015 0:00 7508 1 HTELI109 3 2180863
US000328 CRE 21May2015 0:00 7212 1 HTELI109 3 2232830
US000328 CRE 01Oct2015 0:00 7111 1 HTELI121 2 2232830
US000329 Removal 21Mar2014 0:00 7110 0 HTELI114 2 2130325
US000329 CRE 18Sep2014 0:00 7517 1 HTELI121 3 2130325
US000331 CRE 02Sep2013 0:00 7207 0 HTELI020 2 2059478
US000331 Removal 17Jun2015 0:00 7207 0 HTELI020 2 2689105
US000331 Inst 19Apr2016 0:00 7114 1 HTELI114 3 2689105
US000334 Inst 26Jun2013 0:00 7512 0 HTELI102 2 2535592
US000334 CRE 04Jul2014 0:00 7217 1 HTELI114 2 2815903
这里发生的事情基本上就是第一行“改变”。 column始终为0,然后如果当前行中的TPMC与上一行中的TPMC不同,则会在' Change'列,否则显示0。
怎么办?
当我按照 Chris J (在回答中)写的那样运行代码时,我得到了以下结果,不幸的是,这不符合要求:
proc sort data=haveb1 ;
by MC Date Time ;
run ;
data want ;
set haveb1 ;
by MC Date Time TPMC notsorted ;
if first.MC then Change = 0 ;
else
if first.TPMC then Change + 1 ;
run ;
结果:
US0001 Lath 02JAN13 19:24 876 2660403.00000 1 0
US0001 CRE 29MAY13 0:00 7611 HTELI1124P 1 . 2734440.00000 1 0
US0001 CRE 31JAN14 0:00 7402 HTELI1015P 2 . 2735017.00000 1 0
US0001 Lath 12JAN15 7:00 . 2900334.00000 1 0
US000323 Lath 13OCT12 19:37 852.2 1332753.00000 1 0
US000323 WI 25OCT12 0:00 . 1342148.00000 1 0
US000323 Remov 31OCT12 0:00 7416 HTELI1079P 3 . 1346049.00000 1 0
US000323 Lath 31OCT12 14:03 890.5 1346049.00000 1 0
US000323 Installation 11JAN13 0:00 7408 HTELI1034P 3 . 1346049.00000 1 0
US000323 Lath 16.marras.13 19:52 888.7 1417443.00000 1 0
US000323 Lath 12OCT13 13:49 886.7 1606899.00000 1 0
US000323 Lath 12OCT13 14:17 886.7 1606899.00000 1 0
US000323 Remov 24MAY14 0:00 7408 HTELI1034P 3 . 1812537.00000 1 0
US000328 Meas 17OCT12 16:11 . 1941116.00000 . 0
US000328 Meas 17OCT12 16:11 852.2 1941116.00000 . 1
US000328 Meas 18OCT12 10:53 849.8 1943064.00000 . 0
US000328 Meas 18OCT12 10:53 849.8 1942090.00000 . 1
US000328 Meas 18OCT12 10:53 852.1 1943064.00000 . 2
US000328 Meas 18OCT12 10:53 852.1 1942090.00000 . 3
US000328 Meas 20OCT12 10:17 849.7 1944562.00000 . 0
US000328 Meas 20OCT12 10:17 851.9 1944562.00000 . 1
答案 0 :(得分:1)
使用first.
和notsorted
选项:
proc sort data=haveb1 ; by MC Date Time ; run ; data want ; set haveb1 ; by MC Date Time TPMC notsorted ; if first.MC then Change = 0 ; else if first.TPMC then Change + 1 ; run ;
答案 1 :(得分:1)
考虑以下两个proc sql
运行第一个初始data
步骤(也可以在导入数据期间处理)。如果SAS允许像RDMS这样的CTE,这可以在一次运行中处理。当然,您仍然可以将第一个查询嵌入到第二个查询中的每个 want1 中proc sql
。
步骤
在此之前,使用行号_N_
(在大多数SQL引擎数据库表中惯用)添加主 ID 列,用于相同日期记录的断路器:
data haveb1;
set haveb1;
id = _N_;
run;
第一个SQL使用相关聚合子查询为每个 MC 组按顺序 Date 返回计数,条件为 TMPC 维护非零长度值。请注意:TMPC必须作为下面的工作角色。
CASE
逻辑语句用于清除缺少的 TPMC 值。SAS 代码
proc sql;
CREATE TABLE want1 AS
SELECT h.MC, h.ET, h.Date, h.Time,
(SELECT Count(*) FROM haveb1 sub
WHERE sub.MC = h.MC AND sub.TPMC ne h.TPMC AND LENGTH(sub.TPMC) > 1
AND (sub.Date < h.Date OR sub.Date = h.Date AND sub.id < h.id )) AS Change,
h.TPMC, h.PXMC, h.Site, h.Dia, h.MV
FROM haveb1 h;
quit;
proc sql;
CREATE TABLE want2 AS
SELECT w.MC, w.ET, w.Date, w.Time,
CASE WHEN LENGTH(w.TPMC) > 1
THEN (SELECT Count(*)
FROM (SELECT DISTINCT t.MC, t.Change FROM want1 t) sub
WHERE sub.MC = w.MC AND sub.Change < w.Change)
ELSE 0
END AS Change,
w.TPMC, w.PXMC, w.Site, w.Dia, w.MV
FROM want1 w;
quit;