Question

我想在我的数据集中创建一个新变量。如果有人患有烟草疾病，这个变量只是一个二元变量。我正在查看患者数据，每位患者最多有9个疾病代码。我有一个名为烟草的数据集，存储所有烟草疾病代码。

这就是我认为我能做到的：

data outpreg;
set outpreg;
if diag1 = tobacco OR diag2 = tobacco OR diag3 = tobacco or diag4 = tobacco or diag5 = tobacco or diag6 = tobacco or 
diag7 = tobacco or diag8 = tobacco or diag9 = tobacco then co2=1;
run;

但是这给了我太多，因为它是正确的。任何帮助将不胜感激。

Answer 1

它没有做你想做的事。您当前的代码正在尝试将diag1的值与同一tobacco数据集中名为outpreg的变量进行比较。由于没有变量tobacco，因此SAS正在创建新变量tobacco并将其初始化为缺少.。为了做您想做的事情，我会将outpreg数据集加入到每个tobacco变量的diag数据集中。

proc sql;
select
  o.*,
  t1.tobacco_cd is not null or
  t2.tobacco_cd is not null or
  t3.tobacco_cd is not null as co2
from
  outpreg as o
  left join tobacco as t1
  on o.diag1 = t1.tobacco_cd
  left join tobacco as t2
  on o.diag2 = t2.tobacco_cd
  left join tobacco as t3
  on o.diag3 = t3.tobacco_cd
;
quit;

这会根据代码列表检查每个diag变量，如果匹配则将co2设置为1，如果匹配则设置为0。diag1例如，如果t1.tobacco_cd is not null匹配，那么1将为true，整个表达式的计算结果为proc format; value $tobaccocd '30300','30301','30302','30303'= 'Tobacco' other='Not Tobacco'; quit;。

您必须将其展开以涵盖所有九个变量，而不仅仅是三个。

另一个选择是将您的烟草代码设置为乔在question中建议的格式。

co2

然后，您可以在数据步骤中创建data outpreg2; set outpreg; if put(diag1,$tobaccocd.) = 'Tobacco' or put(diag2,$tobaccocd.) = 'Tobacco' or put(diag3,$tobaccocd.) = 'Tobacco' then co2 = 1; else co2 = 0; run;变量，如下所示：

{{1}}

SAS - 如果条件与另一个变量匹配，则创建一个新变量？

1 个答案: