我有一个包含学生信息的数据集。对机构的教育。 我想了解他们学习过多少种不同的学习课程组合。我有关于硕士和学士学位的信息,我想计算每个教育水平(硕士,学士)的不同学习课程的数量。
例如person1可以有:
Bachelor:
- study1
- study2
- study3
- study3
Master:
- studyA
- studyA
然后我想要一个3个学士学位课程(学习3不应该计数两次),以及1个硕士学位课程。 每个学习计划都有自己的行 - 因此在数据集中person1有6行。 我希望每人一排告诉每个教育水平的学习课程数量:
person number_bachelor number_master
person1 3 1
....etc...
我试过这个:
proc sql;
create table new as
select distinct personid, name,
count(study) as number_of_bach
from old
group by personid, edu_level, study;
quit;
但它并没有给我我想要的东西。 这给了我两行person1,变量" number_of_bach"中的值为1和。
如何编辑此代码以获得我想要的结果?
答案 0 :(得分:2)
代码:
data education;
input person $ level $ program $;
datalines;
person1 bachelor study1
person1 bachelor study2
person1 bachelor study3
person1 bachelor study3
person1 master study1
person2 bachelor study1
person2 master study2
person2 master study1
;
run;
proc sort data = education nodupkey;
by person level program;
run;
proc sql;
select person,
sum(case when level eq 'bachelor' then 1 else 0 end) as num_bachelors,
sum(case when level eq 'bachelor' then 1 else 0 end) as num_masters
from education
group by person;
quit;
工作:此处,SORT过程将消除重复记录(如果有)。然后,SQL程序只能用于生成学士级别的人员计划和主要级别的程序计数。
输出:
person num_bachelors num_masters
person1 3 1
person2 1 2
答案 1 :(得分:0)
这是你想要的吗?
DATA old;
INPUT personid edu_level $ study $;
DATALINES;
1 bachelor study1
1 bachelor study2
1 bachelor study3
1 bachelor study3
1 master studyA
1 master studyA
1 master studyB
;
PROC SQL;
CREATE TABLE new AS
SELECT personid, edu_level, COUNT (DISTINCT study) AS num_bach
FROM OLD
GROUP BY personid, edu_level;
QUIT;
列study
是查询中所谓的聚合列(因为COUNT
是一个聚合函数),因此不应包含在GROUP BY
- 子句中(否则你的查询也将按“学习”进行分组,计数总是为1。
如果您想在一行中拥有一个人,请添加PROC TRANSPOSE
:
PROC transpose IN = new OUT = new2;
BY personid;
ID edu_level;
RUN;
(您也可以使用子查询和连接而不是转置来创建更复杂的查询,只要您没有数百万行,TRANSPOSE
的开销就无关紧要了)
为了完整起见,这里是您的问题的SQL解决方案:
PROC SQL;
CREATE TABLE new AS
SELECT p.personid, b.num_bachelors, m.num_masters
/* Select unique personids */
FROM (SELECT DISTINCT personid
FROM old) AS p
/* Count number of bachelor-level courses */
LEFT JOIN (SELECT personid,
COUNT(DISTINCT study) AS num_bachelors
FROM old WHERE edu_level = 'bachelor'
GROUP BY personid) AS b on p.personid = b.personid
/* Count number of master-level courses */
LEFT JOIN (SELECT personid,
COUNT(DISTINCT study) AS num_masters
FROM old WHERE edu_level = 'master'
GROUP BY personid) AS m on p.personid = m.personid;
QUIT;