我有一个数据集,其中包含一组不平衡的观察值,我想用最新的非缺失字符串来向前和向后填充股票行情的缺失和/或“错误”观察值。
id time ticker_have ticker_want
------------------------------
1 1 ABCDE YYYYY
1 2 . YYYYY
1 3 . YYYYY
1 4 YYYYY YYYYY
1 5 . YYYYY
------------------------------
2 4 . ZZZZZ
2 5 ZZZZZ ZZZZZ
2 6 . ZZZZZ
------------------------------
3 1 . .
------------------------------
4 2 OOOOO OOOOO
4 3 OOOOO OOOOO
4 4 OOOOO OOOOO
基本上,如果观测值已经具有股票行情记录,但是此股票行情记录与最新的非空股票行情记录不同,我们将使用最新的股票行情记录替换此股票行情记录。
到目前为止,我已成功使用此代码填补了缺失的意见
proc sql;
create table have as select * from old_have order by id, time desc;
quit;
data want;
drop temp;
set have;
by id;
/* RETAIN the new variable*/
retain temp; length temp $ 5;
/* Reset TEMP when the BY-Group changes */
if first.id then temp=' ';
/* Assign TEMP when X is non-missing */
if ticker ne ' ' then temp=ticker;
/* When X is missing, assign the retained value of TEMP into X */
else if ticker=' ' then ticker=temp;
run;
现在,我不得不弄清楚无法使用last.ticker
或first.ticker
...访问非缺失值的情况...
如何使用DATA
或PROC SQL
或任何其他SAS命令执行此操作?
答案 0 :(得分:1)
您可以通过多种方式执行此操作,但是带有一些嵌套子查询的proc sql
是一种解决方案。
(从内向外读取,然后从#1开始,然后从2开始,然后从3开始。如果有帮助,可以先将每个子查询构建到数据集中)
proc sql ; create table want as /* #3 - match last ticker on id */ select a.id, a.time, a.ticker_have, b.ticker_want from have a left join /* #2 - id and last ticker */ (select x.id, x.ticker_have as ticker_want from have x inner join /* #1 - max time with a ticker per id */ (select id, max(time) as mt from have where not missing(ticker_have) group by id) as y on x.id = y.id and x.time = y.mt) as b on a.id = b.id ; quit ;
答案 1 :(得分:1)
考虑使用数据步骤为每个 id 在 time 之前检索最后一个报价器,然后将其连接到主表。另外,使用CASE
语句有条件地分配新的报价器(如果不存在)。
data LastTicker;
set Tickers (where=(ticker_have ~=""));
by id;
first = first.id;
last = last.id;
if last = 1;
run;
proc sql;
create table Tickers_Want as
select t.id, t.time, t.ticker_have,
case when t.ticker_have = ""
then l.ticker_have
else t.ticker_have
end as tickerwant
from Tickers t
left join LastTicker l
on t.id = l.id
order by t.id, t.time;
quit;
数据
data Tickers;
length ticker_have $ 5;
input id time ticker_have $;
datalines;
1 1 ABCDE
1 2 .
1 3 .
1 4 YYYYY
1 5 .
2 4 .
2 5 ZZZZZ
2 6 .
3 1 .
4 2 OOOOO
4 3 OOOOO
4 4 OOOOO
;
输出
Obs id time ticker_have tickerwant
1 1 1 ABCDE ABCDE
2 1 2 YYYYY
3 1 3 YYYYY
4 1 4 YYYYY YYYYY
5 1 5 YYYYY
6 2 4 ZZZZZ
7 2 5 ZZZZZ ZZZZZ
8 2 6 ZZZZZ
9 3 1
10 4 2 OOOOO OOOOO
11 4 3 OOOOO OOOOO
12 4 4 OOOOO OOOOO