SAS递归加入

时间:2015-09-24 23:20:02

标签: sas

我有一个大型连接表,并希望扩展该表以包含递归连接。

我的数据看起来像这样 -

data city_list;  

input from_city $ to_city $;
datalines;  
PORTLAND SEATTLE
SEATTLE BOISE
BOISE PORTLAND
PORTLAND HELENA
NYC ORLANDO
ORLANDO MIAMI
;
run;

我想扩展数据集以包含中途停留,因此最终看起来像这样。我不关心我是否同时拥有“波特兰/西雅图”和“西雅图/波特兰”记录 - 我可以在必要时处理这些记录。

BOISE   HELENA
BOISE   PORTLAND
BOISE   SEATTLE
NYC MIAMI
NYC ORLANDO
ORLANDO MIAMI
PORTLAND    HELENA
PORTLAND    SEATTLE
SEATTLE HELENA

我尝试使用以下宏,但在递归级别太多时遇到了性能问题。我相信最好的选择是哈希表,但我不确定如何编码这个精确的场景。

data city_list;  

input from_city $ to_city $;
datalines;  
PORTLAND SEATTLE
SEATTLE BOISE
BOISE PORTLAND
PORTLAND HELENA
NYC ORLANDO
ORLANDO MIAMI
;
run;

%macro RecurJoin(
baseTbl,
destTbl,
baseKey,
compKey
);

Proc SQL;
Create Table WORK.RECUR_JOIN_TBL as
SELECT distinct Base.&baseKey, Connect.&compkey
  FROM &baseTbl AS Base
       INNER JOIN &baseTbl AS Connect
          ON (Base.&compkey = Connect.&baseKey)
       LEFT JOIN &baseTbl AS Subbase
          ON (Base.&baseKey = Subbase.&baseKey) AND
             (Connect.&compkey = Subbase.&compkey)
 WHERE Subbase.&baseKey IS NULL;
quit;

  proc sql noprint;
    select count(1) into :connectCnt from RECUR_JOIN_TBL;
  quit;

Data &destTbl;
  set &baseTbl
      RECUR_JOIN_TBL;
run;

    Proc DataSets nolist;
        Delete RECUR_JOIN_TBL;
    Quit;

%if &connectCnt > 0 %then %do;
    %RecurJoin(
    baseTbl=&destTbl,
    destTbl=&destTbl,
    baseKey=&baseKey,
    compKey=&compKey
    );
%end;

%mend;

%RecurJoin(
baseTbl=city_list,
destTbl=FNL_CITY_LIST,
baseKey=from_city,
compKey=to_city
);

Proc Sort data=WORK.FNL_CITY_LIST (where=(NOT(from_city=to_city)));
  by from_city to_city;
run;

1 个答案:

答案 0 :(得分:0)

内存允许,您可以使用我在this answer中提出的基于哈希的方法来识别数据集中已连接城市的组。然后,您只需要为同一组中的每对城市生成一行,这可以通过proc sql中的笛卡尔联接轻松完成。