我有以下内容:
data have;
input id day $ col1;
datalines;
84 day_1_x 302
84 day_1_y 443
18 day_2_x 774
18 day_2_y 281
61 day_5_x 488
61 day_5_y 564
;
run;
我想基于x
创建两个新列y
和day
。
| id | day | x | y |
|----+-----+-----+-----|
| 84 | 1 | 302 | 443 |
| 18 | 2 | 774 | 281 |
| 61 | 5 | 488 | 564 |
|----+-----+-----+-----|
我天真的尝试并没有让我在那里。
data dont_quite_want;
set have;
length x 8 y 8;
if scan(day, 3, '_') = 'x' then x = col1;
else y = col1;
run;
| Obs | id | day | col1 | x | y |
|-----+----+---------+------+-----+-----|
| 1 | 84 | day_1_x | 302 | 302 | . |
| 2 | 84 | day_1_y | 443 | . | 443 |
| 3 | 18 | day_3_x | 774 | 774 | . |
| 4 | 18 | day_3_y | 281 | . | 281 |
| 5 | 61 | day_5_x | 488 | 488 | . |
| 6 | 61 | day_5_y | 564 | . | 564 |
|-----+----+---------+------+-----+-----|
如果有一次性的方法,我不会感到惊讶,但我想知道,如何将转换分解为单独的步骤?
答案 0 :(得分:3)
我可能这样做的方法是进行转置 - 这需要转置,你可以一次性对数据进行查看,如果这对你很重要的话。
data for_transpose/view=for_transpose;
set have(rename=day=long_day);
day = scan(long_day,2,'_');
varname = scan(long_day,3,'_');
run;
proc transpose data=for_transpose out=want;
by id day notsorted;
var col1;
id varname;
run;
但是,在一个数据步骤中也很容易做到 - 只是更加硬编码。
data want;
set have(rename=day=long_day);
by id notsorted; *your two ID rows still need to be adjacent;
length x 8 y 8;
retain x y day;
if first.id then call missing(x,y); *reset them;
day = scan(long_day,2,'_');
if scan(long_day, 3, '_') = 'x' then x = col1;
else y = col1;
if last.id then output;
run;
答案 1 :(得分:2)
这是转置版本。 @Joe为您提供了完整的数据步骤。
/*Pull out the day and variable name from the day column
Use and index so you don't have to sort*/
data have(index=(id));
set have;
d = input(scan(day,2,"_"),best.);
_name_ = scan(day,3,"_");
run;
/*Transpose the data keeping the variables you want and renaming
as needed */
proc transpose data=have out=want(drop=_name_ rename=(d=day));
by id d;
var col1;
id _name_;
run;