Question

我为大学生提供纵向数据集，其中包含各种人口统计和绩效相关信息。我有兴趣研究注册状态，但数据集不包含＆＃34;未注册＆＃34;的信息。（即，如果学生离开一个学期，然后返回）。我需要一种方法来为这个＆＃34;未注册的＆＃34;时间段。数据集的结构如下：

  Student_ID  Enrollement_Status  Year  Semester  Variable1 Varible2    
    1                 1           2011      1         
    1                 2           2011      2
    1                 1           2012      2 
    2                 2           2011      1
    2                 2           2011      2

我需要对Student_ID 1进行观察，其中Enrollement_Status为0（1和2表示部分时间/全时），对于Year = 2012中的Semester = 1。这一年是基于学年（不是日历年），所以秋季/春季都是一样的。我还需要复制Variable1 - Variablen的信息，因为它不会及时改变。可能存在超过一个学期的差距，因此在这种情况下，我需要两个（或更多）观察学生未注册的每个学期。

谢谢！布赖恩

Answer 1

您可以使用proc sql执行此操作。我们的想法是生成您想要的所有行 - 这将是所有学生和所有年/学期组合。使用left join来引入现有信息。不匹配的列将为NULL：

select s.student_id, coalesce(ss.Enrollement_Status) as Enrollement_Status,
       ys.year, ys.semester, s.variable1, s.variable2
from (select distinct year, semester from students) ys cross join
     (select distinct student_id from students) s left join
     students ss
     on ss.year = ys.year and ss.semester = ys.semester and
        ss.student_id = s.student_id;

Answer 2

Data STUDENTS;
    Input Student_ID Enrollment_Status Year Semester;
    Datalines;
1 1 2011 1
1 2 2011 2
1 1 2012 2
2 2 2011 1
2 2 2011 2
;
Run;

从@ Joe的评论中了解SPARSE (PAPER)。它似乎可以创建一个包含所有可能学期的数据集。

proc freq data=STUDENTS noprint;
tables student_id*year*semester/sparse out=all(drop=percent count);
run;

然后加入两个数据集来填补空洞。不像@ GordonLinoff的答案那样，所以如果我错过了一些东西，我总是有兴趣了解更多关于SQL的信息

Proc Sql;
Create table WANT as
select a.student_id
, CASE
        When s.Enrollment_Status=. THEN 0
        ELSE s.Enrollment_Status
        END as Enrollment_Status
, a.year, a.semester
from  all a left join students s on a.student_id=s.student_id and a.year=s.year and a.semester=s.semester
;
Quit;

创建遗漏观察

2 个答案: