如何检查是否有遗漏并替换

时间:2017-06-17 19:15:14

标签: sas

我有一个数据集,涵盖了许多公司,其中有一个公司员工的变量。有些年份,员工人数尚未报告,因此有些年份显得空白,而前一年和后一年则包含一个值。 数据类似于:

COMPANY     YEAR       NO. EMPLOYEES 
Company 1   2007       4
Company 1   2008       5
Company 1   2009       5
Company 1   2010       5
Company 2   2007       11 
Company 2   2008       10 
Company 2   2009   
Company 2   2010       10 
Company 3   2007       3 
Company 3   2008       4 
Company 3   2009   
Company 3   2010       3 

我希望能够在数据集中搜索任何此类事件,制作这些年份的指标,然后用前一年替换任何空白点。如果没有前一年用作替代品或前一年是空白,则在空白点之后的一年。我希望数据集能够像:

COMPANY     YEAR       NO. EMPLOYEES 
Company 1   2007       4
Company 1   2008       5
Company 1   2009       5
Company 1   2010       5
Company 2   2007       11 
Company 2   2008       10 
Company 2   2009       10
Company 2   2010       10 
Company 3   2007       3 
Company 3   2008       4 
Company 3   2009       4
Company 3   2010       3 

总而言之,首先我需要检查两年之间是否确实存在缺失值的问题(重要的是代码不会在去年之前或之后替换缺失值而非缺失价值,因为som公司退出样本)。接下来,如果任何两年之间的空白年份是非空白的,我想替换上面提到的这些空白点。

1 个答案:

答案 0 :(得分:0)

我将使用的方法: 1.对数据集公司/年进行排序。 2.如果缺失值不是公司组的第一个观察值,则使用LAG函数替换缺失值。 3.撤消排序顺序 4.使用颠倒顺序对数据集重复步骤2 5.将数据集返回到原始订单

请注意,我已更改了公司3的原始数据,以便为您的第二个方案(缺失值,没有以前的记录)提供案例。

DATA HAVE;
    input COMPANY $ 0-10 YEAR 13-17 N_EMPLOYEES 24-27;
    datalines;
Company 1   2007       4
Company 1   2008       5
Company 1   2009       5
Company 1   2010       5
Company 2   2007       11 
Company 2   2008       10 
Company 2   2009          
Company 2   2010       10 
Company 3   2007          
Company 3   2008       3  
Company 3   2009       4  
Company 3   2010       3  
;
run;

PROC SORT DATA=HAVE
    OUT=DOSOMEWORKHERE;
    BY COMPANY YEAR;
RUN;


DATA DOSOMEWORKHERE (drop=PREV_N_EMPLOYEES);
    set DOSOMEWORKHERE;
    by COMPANY; 
    PREV_N_EMPLOYEES = LAG(N_EMPLOYEES); 
    if first.COMPANY then
        do;
            PREV_N_EMPLOYEES = .;
        end;
    if N_EMPLOYEES = . then N_EMPLOYEES = PREV_N_EMPLOYEES; 
run;

PROC SORT DATA=DOSOMEWORKHERE
    OUT=DOSOMEWORKHERE;
    BY DESCENDING COMPANY DESCENDING YEAR ;
RUN;

DATA DOSOMEWORKHERE (drop=PREV_N_EMPLOYEES);
    set DOSOMEWORKHERE;
    by DESCENDING COMPANY; 
    PREV_N_EMPLOYEES = LAG(N_EMPLOYEES); 
    if first.COMPANY then
        do;
            PREV_N_EMPLOYEES = .;
        end;
    if N_EMPLOYEES = . then N_EMPLOYEES = PREV_N_EMPLOYEES; 
run;

PROC SORT DATA=DOSOMEWORKHERE
    OUT=WANT;
    BY COMPANY YEAR;
RUN;

结果:

Result