如果缺少调查表数据的值少于80%,则SAS输入平均值

时间:2019-04-26 21:58:26

标签: sas mean missing-data survey

我有一个编码为1-5的调查表,然后针对缺少的变量标记为(。)。如何编码数据以反映以下内容:

如果患者=> 80%的值不小于缺失值,则将其编码为已回答问题的平均值。如果患者遗漏了超过80%的值,而不是将度量摘要设置为患者遗漏,则删除记录。

condomuse;
set int108;
run;

proc means data=condomuse n nmiss missing;
var cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
by Intround sid;
run;

1 个答案:

答案 0 :(得分:0)

使用以下假设:

  • 每行/每条记录都是唯一的人
  • 所有变量都是数字

NMISS(),N(),CMISS()和DIM()是可以与数组一起使用的函数。

这将标识丢失80%或更多的所有记录。

data temp; *temp is output data set name;
    set have; *have is input data set name;

    *create an array to avoid listing all variables later;
    array vars_check(*) cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;

    *calculate percent missing;
    Percent_Missing = NMISS(of vars_check(*)) / Dim(vars_check);

    if percent_missing >= 0.8 then exclude = 'Y';
    else exclude = 'N';

 run;

要用平均值或其他方法替换,PROC STDIZE可以做到。

*temp is input data set name from previous step;
proc stdize data=temp out=temp_mean reponly method=mean;
*keep only records with more than 80%;
where exclude = 'N';

*list of vars to fill with mean;
VAR cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;

run;

不同的标准化方法是here,但是这些是标准化方法而不是归因方法。