我的数据是一系列学校及其在某些科目评估中的表现,以及参加课程的性别比例。我在下面创建了一个示例数据集:
data have;
input school $ subject $ perc_male perc_female score similar_school $;
datalines;
X math 51 49 93 Y
X english 48 52 95 Y
X tech 60 40 90 Y
X science 57 43 92 Y
Y math . . 87 X
Y english . . 83 X
Y science . . 81 X
Y language . . 91 X
Z math 40 60 78 Z
Z english 50 50 76 Z
Z science 45 55 80 Z
;
run;
正如你所看到的,没有为Y学校收集性别百分比。研究表明,X学校的性别分布非常相似,所以我希望将特定学科的百分比从X归结为Y.另一个问题是Y有语言得分,而X没有接受这种评估。在这种情况下,我希望得到估算值(51,48,57)的平均值,得到52个男性语言课程学习者的百分比。
执行此操作将演示我想要的输出:
data want;
input school $ subject $ perc_male perc_female score;
datalines;
X math 51 49 93 Y
X english 48 52 95 Y
X tech 60 40 90 Y
X science 57 43 92 Y
Y math 51 49 87 X
Y english 48 52 83 X
Y science 57 43 81 X
Y language 52 48 91 X
Z math 40 60 78 Z
Z english 50 50 76 Z
Z science 45 55 80 Z
;
run;
得到了一个downvote,所以添加我试图几乎把我带到我需要的地方。对于任何投票的人,我想知道你是否有任何建设性的反馈。谢谢!我想知道是否有办法将平均插补部分构建到我当前的片段中。另外,我在想可能有更有效的方法来做到这一点。任何帮助将不胜感激。
proc sql;
select distinct cats("'",similar_school,"'") into :school_list separated by ','
from have
where perc_male=.;
quit;
proc sql;
create table stuff as
select similar_school as school, subject, perc_male, perc_female
from have
where school in (&school_list.);
quit;
proc sql;
create table want2 as
select a.school, a.subject, coalesce(a.perc_male,b.perc_male), coalesce(a.perc_female,b.perc_female), a.score, a.similar_school
from have as a
left join stuff as b
on a.school=b.school and a.subject=b.subject
;
quit;
答案 0 :(得分:1)
根据您预期的数据,简单的SQL可以解决您的问题。您可以先根据学校和类似的学校信息进行自我加入,然后合并perc_male& perc_female信息。这将照顾您的第一个问题..对于问题的第二部分,您可以计算每个学校的平均值和合并perc_male& perc_female信息与学校的平均值。看看下面的sql并告诉我它是否有帮助。
proc sql;
create table want as
select aa.school
, aa.subject
, coalesce(aa.perc_male, mean(aa.perc_male)) as perc_male
, coalesce(aa.perc_female,mean(aa.perc_female)) as perc_female
, score
, similar_school
from (
select a.school
, a.subject
, coalesce(a.perc_male ,b.perc_male) as perc_male
, coalesce(a.perc_female,b.perc_female) as perc_female
, a.score
, a.similar_school
from have as a
left join have as b
on b.school=a.similar_school
and a.subject=b.subject
) as aa
group by aa.school
;
quit;