使用do loop-SAS在数据步骤中定义一个带变量的过滤器

时间:2019-01-31 10:54:10

标签: loops sas iteration datastep

早上好, 我有这个问题。

有2个数据集

数据集“ ID客户”,其中我有这个:

id       |  Customer Name   |
-----------------------------
123456   | Michael One      |
123123   | George Two       |
123789   | James Three      |

和第二个名为“交易”的数据集:

id       |  Transaction | Date
-----------------------------------
123456   | Fuel         | 01NOV2018
123456   | Fuel         | 03NOV2018
123123   | Fuel         | 10NOV2018
123456   | Fuel         | 25NOV2018
123123   | Fuel         | 13NOV2018
123456   | Fuel         | 10DEC2018
123789   | Fuel         | 1NOV2018
123123   | Fuel         | 30NOV2018
123789   | Fuel         | 15DEC2018

我想要的结果是创建3个db,就像我在第一个名为的数据集中的3个客户ID一样:

_01NOV2018_15NOV_123456_F
_01NOV2018_15NOV_123123_F
_01NOV2018_15NOV_123789_F

其中包含:

For  _01NOV2018_15NOV_123456_F :
id       |  Transaction | Date
-----------------------------------
123456   | Fuel         | 01NOV2018
123456   | Fuel         | 03NOV2018

For _01NOV2018_15NOV_123123_F :

id       |  Transaction | Date
-----------------------------------
123123   | Fuel         | 10NOV2018
123123   | Fuel         | 13NOV2018

For _01NOV2018_15NOV_123789_F

empty

我需要为一个子句创建一个变量,该子句在数据步骤中……我该怎么做?

感谢您的帮助! :)`

2 个答案:

答案 0 :(得分:0)

HASH OUTPUT方法是在DATA步骤运行时创建动态命名输出数据集的唯一方法。根据问题注释,您很可能不想将原始数据集拆分为许多内容命名的片段。无论如何,例如SAS中的过程称为拆分。

学习如何在DATA步骤和PROC步骤中应用WHERE语句和BY组处理将为您提供更好的服务。

所需的输出似乎根据月的一半进行了分离或分类。计算一个包含适当分类值的新semimonth变量,然后在下游使用该变量,例如在PROC PRINT中,可能会为您提供最好的服务。

data customers;
infile cards dlm='|';
attrib
  id length=8
  name length=$20
;
input id name ;
datalines;
123456   | Michael One      |
123123   | George Two       |
123789   | James Three      |
run;

data transactions;
infile cards dlm='|';
attrib
  id length=8
  transaction length=$10
  date length=8 format=date9. informat=date9.
;
input id transaction date;
datalines;
123456   | Fuel         | 01NOV2018
123456   | Fuel         | 03NOV2018
123123   | Fuel         | 10NOV2018
123456   | Fuel         | 25NOV2018
123123   | Fuel         | 13NOV2018
123456   | Fuel         | 10DEC2018
123789   | Fuel         | 1NOV2018
123123   | Fuel         | 30NOV2018
123789   | Fuel         | 15DEC2018
run;

proc sort data=customers;
  by id;
proc sort data=transactions;
  by id date;

* merge datasets and compute semimonth;

data want;
  merge transactions customers;
  by id;

  semimonth = intnx('month',date,0) + 16 * (day(date) > 15);

  attrib semimonth
    format=date9.
    label="Semi-month"
  ;
run;


* process data by semimonth and id, restricting with where;

proc print data=want;
  by semimonth id;
  where semimonth = '01NOV2018'D;
run;

enter image description here

答案 1 :(得分:0)

您可以通过一个小宏或仅对proc导出代码进行快速过滤来执行此操作。

proc export data=sashelp.class (where=(sex='F')) outfile='/folders/myfolders/females.xlsx' dbms=xlsx replace; run;

proc export data=sashelp.class (where=(sex='M')) outfile='/folders/myfolders/females.xlsx' dbms=xlsx replace; run;

或者您可以将其转换为一个小宏:

    %macro exportData(group=);

proc export data=sashelp.class (where=(sex="&group."))
outfile="C:\_localdata\&group..xlsx" 
dbms=xlsx 
replace; 
run;

%mend;

*create list of unique elements to call the macro;
proc sort data=sashelp.class nodupkey out=class;
by sex;
run;

*call the macro once for each group;  

data test;
   set class;
    str = catt('%exportData(group=', sex, ');');
    call execute(str);
run;