我有一个大型连接表,并希望扩展该表以包含递归连接。
我的数据看起来像这样 -
data city_list;
input from_city $ to_city $;
datalines;
PORTLAND SEATTLE
SEATTLE BOISE
BOISE PORTLAND
PORTLAND HELENA
NYC ORLANDO
ORLANDO MIAMI
;
run;
我想扩展数据集以包含中途停留,因此最终看起来像这样。我不关心我是否同时拥有“波特兰/西雅图”和“西雅图/波特兰”记录 - 我可以在必要时处理这些记录。
BOISE HELENA
BOISE PORTLAND
BOISE SEATTLE
NYC MIAMI
NYC ORLANDO
ORLANDO MIAMI
PORTLAND HELENA
PORTLAND SEATTLE
SEATTLE HELENA
我尝试使用以下宏,但在递归级别太多时遇到了性能问题。我相信最好的选择是哈希表,但我不确定如何编码这个精确的场景。
data city_list;
input from_city $ to_city $;
datalines;
PORTLAND SEATTLE
SEATTLE BOISE
BOISE PORTLAND
PORTLAND HELENA
NYC ORLANDO
ORLANDO MIAMI
;
run;
%macro RecurJoin(
baseTbl,
destTbl,
baseKey,
compKey
);
Proc SQL;
Create Table WORK.RECUR_JOIN_TBL as
SELECT distinct Base.&baseKey, Connect.&compkey
FROM &baseTbl AS Base
INNER JOIN &baseTbl AS Connect
ON (Base.&compkey = Connect.&baseKey)
LEFT JOIN &baseTbl AS Subbase
ON (Base.&baseKey = Subbase.&baseKey) AND
(Connect.&compkey = Subbase.&compkey)
WHERE Subbase.&baseKey IS NULL;
quit;
proc sql noprint;
select count(1) into :connectCnt from RECUR_JOIN_TBL;
quit;
Data &destTbl;
set &baseTbl
RECUR_JOIN_TBL;
run;
Proc DataSets nolist;
Delete RECUR_JOIN_TBL;
Quit;
%if &connectCnt > 0 %then %do;
%RecurJoin(
baseTbl=&destTbl,
destTbl=&destTbl,
baseKey=&baseKey,
compKey=&compKey
);
%end;
%mend;
%RecurJoin(
baseTbl=city_list,
destTbl=FNL_CITY_LIST,
baseKey=from_city,
compKey=to_city
);
Proc Sort data=WORK.FNL_CITY_LIST (where=(NOT(from_city=to_city)));
by from_city to_city;
run;
答案 0 :(得分:0)
内存允许,您可以使用我在this answer中提出的基于哈希的方法来识别数据集中已连接城市的组。然后,您只需要为同一组中的每对城市生成一行,这可以通过proc sql
中的笛卡尔联接轻松完成。