SAS proc freq用于变量的不同值

时间:2016-05-31 16:37:23

标签: sql sas distinct frequency

我有一张桌子,其中包含每个学生两个学期选择的课程信息。这些学生没有验证他们的第一个学期,所以valid_or_not_of_semester='N'的所有semester='1st'

student    semester    course_selected    valid_or_not_of_semester
  A          1st         math               N
  A          1st         english            N
  A          2nd         math               Y
  A          2nd         english            Y
  B          1st         math               N
  B          2nd         math               Y
  B          2nd         english            Y
  C          1st         math               N
  C          2nd         math               N

对于在第一学期选择math(或english)的学生,我想研究他们是否在第二学期选择math(或english)学期,如果是的话,我打算创建一个交叉制表,计算那些在第二学期验证与否的学生人数:

--------------------------------------------------------------------------
   1st semester \ 2nd semester |           Math      |  English
     invalid     \             |---------------------|--------------------
     students     \            |  valid |  invalid   | valid |  invalid  
--------------------------------------------------------------------------
          Math                 |   2    |    1       |  2    |     0
--------------------------------------------------------------------------
         English               |   1    |    0       |  1    |     0
--------------------------------------------------------------------------

每行代表未完成第一学期验证且在第一学期选择课程的学生人数。并且专栏将选择课程的学生分为有效和无效的第二学期。更确切地说,

--------------------------------------------------------------------------
   1st semester \ 2nd semester |           Math      |  English
     invalid     \             |---------------------|--------------------
     students     \            |  valid |  invalid   | valid |  invalid   
--------------------------------------------------------------------------
          Math                 |   2   |       1     |    2  |     0
                                   |           |          |
                                  \ /         \ /        \ /
                         (students A&B)   (student C)  (students A&B)

我试过proc sql:

data math;
   merge have
     have (where=(semester='1st') in=these);
   by student;
   if these then output;
run;

proc sql;
    create table result as
    select count(distinct student) as nb_student
    from math (where=(semester='2nd'))
    group by course_selected, valid_or_not_of_semester;
quit;

english做同样的事情。

但有没有办法直接获得2门课程的成绩?我怎么能使用proc freq?

希望得到你的答案。

1 个答案:

答案 0 :(得分:1)

这并不能准确地为您提供所需的表格,但它会生成您感兴趣的值。我们的想法是转置原始数据集,然后计算观察结果。

您可能还想查看proc tabulate,但您可能会遇到问题,因为您在某些情况下会对学生进行重复计算。

data temp;
   input student $ semester $ course_selected $ valid_or_not_of_semester $;
   datalines;
     A 1st math N
     A 1st english N
     A 2nd math Y
     A 2nd english Y
     B 1st math N
     B 2nd math Y
     B 2nd english Y
     C 1st math N
     C 2nd math N 
    ;
    proc sort; by student;
run;

proc transpose data = temp out = temp2;
    by student;
    id course_selected semester;
    var valid_or_not_of_semester;
run;

proc sql;
    create table temp3 as select distinct
        sum(case when math1st = "N" and math2nd = "Y" then 1 else 0 end) as math_math_valid,
        sum(case when math1st = "N" and math2nd = "N" then 1 else 0 end) as math_math_invalid,
        sum(case when english1st = "N" and math2nd = "Y" then 1 else 0 end) as english_math_valid,
        sum(case when english1st = "N" and math2nd = "N" then 1 else 0 end) as english_math_invalid,
        sum(case when math1st = "N" and english2nd = "Y" then 1 else 0 end) as math_english_valid,
        sum(case when math1st = "N" and english2nd = "N" then 1 else 0 end) as math_english_invalid,
        sum(case when english1st = "N" and english2nd = "Y" then 1 else 0 end) as english_english_valid,
        sum(case when english1st = "N" and english2nd = "N" then 1 else 0 end) as english_english_invalid
        from temp2;
quit;