Question

我当前的问题是我在一个表中有数据，而另一个表包含将每个列归为一类的值范围。

例如：对于“城市”，值从0更改为6，将7-16更改为2，将17+更改为3。

最终，我将不得不在包含100多个列，总共500个值范围/类别的表上使用此代码。

我有合适的代码来创建类别并选择一列又一列，但是主要代码（读取条件并应用它们）使我难以理解。

在下面的示例代码中，test1包含原始数据，test2包含所有列的值范围，test3包含所选列的条件。

proc sql noprint;

create table work.test1
(Id char(4),
        city num,
        country num);

insert into work.test1
    values('1639',5,42260)
    values('1065',10,38090)
    values('1400',15,29769);

create table work.test2
(condition char(7),
        g_l char(6),
        g_p char(6));

insert into work.test2
values('city',"low","6")
values('city',"7","16")
values('city',"17","high")
values('country',"low","1000")
values('country',"1001","high");

    %let zmien = "city";

    data work.test3 (where=(condition = &zmien));
    set work.test2;
    run;

    proc sql noprint; 
    select count(warunek) into :ile_war 
    from work.test3; 
    quit;

    %let kat = 0; /* place where current category is stored */
    %let v_l = 0; /* place where lower border of the category is stored */
    %let v_h = 0; /* place where higher border of the category is stored */
    %macro kat(ile_war);

我的想法是使用宏do循环遍历每一列的所有类别。如果我不使用宏（根据我的知识，不使用循环），并且在if的情况下使用简单的方程式（x = y）代替调用symput，则整个想法都行得通。

%macro kat(ile);
%do a=1 %to &ile;
            data work.test4;
            set work.tesT3 point=a;

                    %if g_l = "low" %then %do;
                            call symput('kat',&a);
                            call symput('war_l',0);                     
                    %end;

                    %if g_l ~= "low" %then %do;
                            call symput('kat',&a);
                            call symput('war_l',g_l);
                    %end;

                    %if g_p = "high" %then %do;
                            call symput('war_h',9999999);
                    %end;

                    %if g_p ~= "high" %then %do;
                            call symput('war_h',g_p);
                    %end;
                            output;

            stop;

            data work.test1;
            modify work.test1(WHERE=(&zmien BETWEEN &war_l AND &war_h));
            &zmien=&kat;
            replace;
            run;

%end;
%mend;

对宏的任何帮助或以其他方式进行操作的建议将不胜感激。

编辑：因此，通过尝试使用推荐的proc格式，我遇到了一个问题-当我对要更改的范围和变量/列进行硬编码时，它可以工作，但是在以下情况下，我不知道如何使其起作用：

A）列名作为宏变量的内容（错误提示找不到格式或不适用格式）

B）范围在数据集中

如何读取变量列的值，将其插入格式，使用它对数据进行分类，然后覆盖它以用于其他列？

Answer 1

如果使用格式，这将是这样的。如果需要，您可以进一步自动化格式的实际实现，但这是我推荐的方法。您可以根据需要创建IF / THEN，但对我来说似乎需要做更多的工作，而且更加挑剔。

*create formats from the data set, test2;
data createFormats;
set test2;
by condition notsorted;
fmtname = catx('_', condition, 'fmt' );
start = g_l;
end = g_p;
label = catx(" to ", g_l, g_p);
run;

proc format cntlin=createFormats;
run;

title 'Original Data';
proc print data=test1;
run;

*recode into formats;

data new;
set test1;

*this part can be automated via a macro assuming you use consistent naming structure as here;

city_group = put(city, city_fmt.);
country_group = put(country, country_fmt.);

run;

title 'formats applied';
proc print data=new;
run;

*apply formats for display, will be honoured by most procs;
proc datasets lib=work nodetails nolist; 
modify test1;
*this could also be automated via a macro;
format city city_fmt. country country_fmt.;
run;quit;

title 'Recoded into new variables';
proc print data=test1;
run;

Answer 2

因此，听起来您想使用TEST2中的格式数据将TEST1中的值转换为代码。因此对于CITY，您具有三个级别，因此您希望生成值1,2,3。因此，您可以使用格式来做到这一点，但是如果您希望结果是数字而不是字符串，那么您将需要使用INPUT（）函数调用将格式化后的值转换回数字。

首先，让我们使用常规SAS代码创建示例数据，因为与SQL INSERT语句相比，调整测试数据的编辑要容易得多。

data test1;
  input id $ city country ;
cards;
1639 5 42260
1065 10 38090
1400 15 29769
;
data test2;
  input condition $ g_l $ g_p $ ;
cards;
city low 6
city 7 16
city 17 high
country low 1000
country 1001 high
;

我们可以将TEST2数据集转换为一种格式。我们可以使用数据步骤来创建使用PROC FORMAT定义格式所需的数据。假设已按CONDITION将其分类为您要创建的类别，以便我们可以生成类别编号。另外，我还将假定CONDITION是有效的格式名称（以alpha或下划线开头，并且不以数字结尾）。

data formats ;
  length fmtname $32 start end 8 hlo $3 label $32 ;
  keep fmtname -- label;
  set test2;
  by condition notsorted;
  if first.condition then row=1;
  else row + 1;
  fmtname = condition ;
  start=input(g_l,??32.);
  end=input(g_p,??32.);
  if g_l='low' then hlo=cats(hlo,'L');
  if g_p='high' then hlo=cats(hlo,'H');
  label = left(put(row,32.));
run;

proc format cntlin=formats ;
run;

要使用这些格式将值转换为类别编号，我们需要生成一些代码。当变量列表足够小时，您可以将代码放入单个宏变量（最大长度为64K字节）。

例如，如果我们要为输入数据集中的任何变量_GRP生成后缀为TEST1的新变量，其名称位于元数据表{{1}中的条件列表中}。我们可以使用这样的代码来生成宏变量。

TEST2

对于您的示例，proc contents data=test1 out=contents noprint; run; proc sql noprint ; select distinct cats(name,'_grp=input(put(',name,',',name,'.),32.)') into :recode separated by ';' from contents where upcase(name) in (select upcase(condition) from test2) ; quit ;宏变量如下所示：

RECODE

然后您可以在日期步骤中使用它来从旧数据集中创建新数据集。

city_grp=input(put(city,city.),32.);
country_grp=input(put(country,country.),32.)

结果：

data want ;
  set test1 ;
  &recode;
run;

如果要重新编码的变量很多，则可以只将代码写入文件中，而不必生成宏变量。

                                              country_
Obs     id     city    country    city_grp       grp

 1     1639      5      42260         1           2
 2     1065     10      38090         2           2
 3     1400     15      29769         2           2

您可能还想生成另一系列的格式，可用于将类别解码回描述中。因此，对于新的proc sql noprint ; create table names as select distinct name from contents where upcase(name) in (select upcase(condition) from test2) ; quit ; filename code temp; data _null_; set names ; file code ; put name +(-1) '_grp=input(put(' name ',' name +(-1) '.),32.);' ; run; data want ; set test1 ; %include code / source2; run;变量，您可能需要生成一种格式CITY_GRP，该格式会将CITY_GRP.转换为1，等等。

low - 6

结果：

data format2 ;
  length fmtname $32 start 8 label $50 ;
  keep fmtname -- label;
  set test2;
  by condition notsorted;
  if first.condition then row=1;
  else row + 1;
  fmtname = catx('_',condition,'grp') ;
  start=row ;
  label = catx(' - ',g_l,g_p);
run;

proc format cntlin=format2; run;

proc print data=want;
 format city_grp city_grp. country_grp country_grp.;
run;

SAS-使用来自另一个数据集的条件来修改一个数据集

2 个答案: