SAS直通功能。如何在查询中从本地表插入大列表?

时间:2019-04-01 19:59:54

标签: sas sas-macro proc-sql pass-through

我需要使用SAS直通功能查询服务器(REMOTE_TBL)中的大表。为了缩短查询时间,我想发送从本地表(LOCAL_TBL)中提取的ID列表。 我的第一步是使用id_list语句将ID放入名为INTO的变量中:

select distinct ID into: id_list separated by ',' from WORK.LOCAL_TBL 

然后我将这些ID传递给传递查询:

PROC SQL;
CONNECT TO sybaseiq AS dbcon
(host="name.cl" server=alias db=iws user=sas_user password=XXXXXX);

create table WANT as 
select * from connection to dbcon(
  select * 
  from    dbo.REMOTE_TBL
  where   ID in (&id_list)
);
QUIT;

除了我收到以下消息外,代码运行良好:

The length of the value of the macro variable exceeds the maximum length

是否有更简单的方法将所选ID发送给传递查询? 有没有办法将所选ID存储在两个或多个变量中?

4 个答案:

答案 0 :(得分:2)

将值存储到多个宏变量中,然后将这些宏变量的名称存储到另一个宏变量中。

因此,此代码将生成一系列名为M1,M2,....的宏变量,然后将ID_LIST设置为&M1,&M2 ....

data _null_;
length list $20200 mlist $20000;
do until(eof or length(list)>20000);
  set LOCAL_TBL end=eof;
  list=catx(',',list,id);
end;
call symputx(cats('m',_n_),list);
mlist=catx(',',mlist,cats('&m',_n_));
if eof then call symputx('id_list',mlist);
run;

然后,当您扩展ID_LIST时,宏处理器将扩展所有单个Mx宏变量。这个小的数据步骤将创建几个示例宏变量来演示该想法。

data _null_;
  call symputx('m1','a,b,c');
  call symputx('m2','d,e,f');
  call symputx('id_list','&m1,&m2');
run;

结果:

70    %put ID_LIST=%superq(id_list);
ID_LIST=&m1,&m2
71    %put ID_LIST=&id_list;
ID_LIST=a,b,c,d,e,f

答案 1 :(得分:1)

您正在传递出现在IN (…)子句中的许多数据值。允许的值数量因数据库而异;有些可能限制每个子句250个值,并且语句的长度可能会有限制。如果该宏变量创建了一个长度为20,000个字符的值列表,那么远程端可能不会这样。

在处理可能大于100个值的查询时,请先花一些时间与数据库管理员联系以创建临时表。拥有这些权利后,您的查询将成为更有效的远程方。

… upload id values to #myidlist … 
create table WANT as 
select * from connection to dbcon(
  select * 
  from    dbo.REMOTE_TBL
  where   ID in (select id from #myidlist)
);
QUIT;

如果您没有获得适当的权限,则必须将id列表切成碎片,并让宏创建一系列OR ed IN搜索。

1=0
OR ID IN ( … list-values-1 … )
… 
OR ID IN ( … list-values-N … )

例如:

data have;
  do id = 1 to 44;
    output;
  end;
run;

%let IDS_PER_MACVAR = 10;  * <---------- make as large as you want until error happens again;

* populated the macro vars holding the chopped up ID list;
data _null_;
  length macvar $20;    retain macvar;
  length macval $32000; retain macval;
  set have end=end;

  if mod(_n_-1, &IDS_PER_MACVAR) = 0 then do;

    if not missing(macval) then call symput(macvar, trim(macval));
    call symputx ('VARCOUNT', group);

    group + 1;
    macvar = cats('idlist',group);
    macval = '';
  end;

  macval = catx(',',macval,id);

  if end then do;
    if not missing(macval) then call symput(macvar, trim(macval));
    call symputx ('MVARCOUNT', group);
  end;
run;

* macro that assembles the chopped up bits as a series of ORd INs;
%macro id_in_ors (N=,NAME=);
   %local i;

1 = 0

   %do i = 1 %to &N;
OR ID IN (&&&NAME.&i)
   %end;

%mend;

* use %put to get a sneak peek at what will be passed through;
%put %id_in_ors(N=&MVARCOUNT,NAME=IDLIST);


* actual sql with pass through;
...

create table WANT as 
select * from connection to dbcon(
  select * 
  from    dbo.REMOTE_TBL
  where   ( %ID_IN_ORS(N=&MVARCOUNT,NAME=IDLIST) )  %* <--- idlist piecewise ors ;
);

...    

答案 2 :(得分:0)

我建议您首先将所有不同的值保存到表中,然后(再次使用proc sql + into)将值加载到几个独立的宏变量中,多次读取表在几套中确实,它们必须相互排斥,但必须共同穷尽。

您是否有权访问dbo.REMOTE_TBL所在的DB中的CREATE特权?如果是这样,您可能还会考虑将WORK.LOCAL_TBL复制到数据库中的临时表中并在其中运行内部联接。

答案 3 :(得分:0)

另一个选项-将查询写到一个临时文件,然后将其%include。不需要宏逻辑!

proc sort 
  data = WORK.LOCAL_TBL(keep = ID) 
  out = distinct_ids 
  nodupkey;
run;

data _null_;
  set distinct_ids end = eof;
  file "%sysfunc(pathname(work))/temp.sas";
  if _n_ = 1 then put "PROC SQL;
    CONNECT TO sybaseiq AS dbcon
    (host=""name.cl"" server=alias db=iws user=sas_user password=XXXXXX);
    create table WANT as 
      select * from connection to dbcon(
        select * 
          from    dbo.REMOTE_TBL
          where   ID in (" @;
  put ID @;
  if not(eof) then put "," @;
  if eof then put ");QUIT;" @;
  put;
run;

/*Use nosource2 to avoid cluttering the log*/
%include "%sysfunc(pathname(work))/temp.sas" /nosource2;