我有一个数据集,涵盖了许多公司,其中有一个公司员工的变量。有些年份,员工人数尚未报告,因此有些年份显得空白,而前一年和后一年则包含一个值。 数据类似于:
COMPANY YEAR NO. EMPLOYEES
Company 1 2007 4
Company 1 2008 5
Company 1 2009 5
Company 1 2010 5
Company 2 2007 11
Company 2 2008 10
Company 2 2009
Company 2 2010 10
Company 3 2007 3
Company 3 2008 4
Company 3 2009
Company 3 2010 3
我希望能够在数据集中搜索任何此类事件,制作这些年份的指标,然后用前一年替换任何空白点。如果没有前一年用作替代品或前一年是空白,则在空白点之后的一年。我希望数据集能够像:
COMPANY YEAR NO. EMPLOYEES
Company 1 2007 4
Company 1 2008 5
Company 1 2009 5
Company 1 2010 5
Company 2 2007 11
Company 2 2008 10
Company 2 2009 10
Company 2 2010 10
Company 3 2007 3
Company 3 2008 4
Company 3 2009 4
Company 3 2010 3
总而言之,首先我需要检查两年之间是否确实存在缺失值的问题(重要的是代码不会在去年之前或之后替换缺失值而非缺失价值,因为som公司退出样本)。接下来,如果任何两年之间的空白年份是非空白的,我想替换上面提到的这些空白点。
答案 0 :(得分:0)
我将使用的方法: 1.对数据集公司/年进行排序。 2.如果缺失值不是公司组的第一个观察值,则使用LAG函数替换缺失值。 3.撤消排序顺序 4.使用颠倒顺序对数据集重复步骤2 5.将数据集返回到原始订单
请注意,我已更改了公司3的原始数据,以便为您的第二个方案(缺失值,没有以前的记录)提供案例。
DATA HAVE;
input COMPANY $ 0-10 YEAR 13-17 N_EMPLOYEES 24-27;
datalines;
Company 1 2007 4
Company 1 2008 5
Company 1 2009 5
Company 1 2010 5
Company 2 2007 11
Company 2 2008 10
Company 2 2009
Company 2 2010 10
Company 3 2007
Company 3 2008 3
Company 3 2009 4
Company 3 2010 3
;
run;
PROC SORT DATA=HAVE
OUT=DOSOMEWORKHERE;
BY COMPANY YEAR;
RUN;
DATA DOSOMEWORKHERE (drop=PREV_N_EMPLOYEES);
set DOSOMEWORKHERE;
by COMPANY;
PREV_N_EMPLOYEES = LAG(N_EMPLOYEES);
if first.COMPANY then
do;
PREV_N_EMPLOYEES = .;
end;
if N_EMPLOYEES = . then N_EMPLOYEES = PREV_N_EMPLOYEES;
run;
PROC SORT DATA=DOSOMEWORKHERE
OUT=DOSOMEWORKHERE;
BY DESCENDING COMPANY DESCENDING YEAR ;
RUN;
DATA DOSOMEWORKHERE (drop=PREV_N_EMPLOYEES);
set DOSOMEWORKHERE;
by DESCENDING COMPANY;
PREV_N_EMPLOYEES = LAG(N_EMPLOYEES);
if first.COMPANY then
do;
PREV_N_EMPLOYEES = .;
end;
if N_EMPLOYEES = . then N_EMPLOYEES = PREV_N_EMPLOYEES;
run;
PROC SORT DATA=DOSOMEWORKHERE
OUT=WANT;
BY COMPANY YEAR;
RUN;
结果: