PROC SQL INNER JOIN QUERY

时间:2015-09-08 03:48:03

标签: sas proc-sql

我正在学习sas proc sql语句。我观察到虽然结果对于以下两种方法是相同的,但实际和cpu时间是不同的。我想知道为什么存在差异。

data data1;
    input name1 $ choice $;
    datalines;
John A
Mary B
Peter C
;
run;

data data2;
    input name2 $ choice2 $;
    datalines;
John B
Mary C
Peter B
run;

方法1:

proc sql;
    select a.*, b.*
    from data1 as a, data2 as b 
    where a.name1= data2.name2
    ;
quit;

方法2:

proc sql;
    select a.* , b.*
    from data1 as a inner join data2 as b
        on a.name1 = b.name2
    ;
quit;

enter image description here

1 个答案:

答案 0 :(得分:0)

为了便于讨论,忽略原因不明的html文件以及CPU和执行时间的任何随机波动,简短的答案可能就是SAS默认以不同的方式处理不同的连接。也许这对于像这里的例子一样小的文件并没有多大区别,但值得了解。

更长的答案是,这可能在某种程度上取决于您正在使用的SAS的确切版本。在带有示例数据集的SAS 9.4中,如果您将proc sql留给自己的设备,我看到生成的查询计划对于两个联接都是相同的:

52         /* Method 1: */
 53         
 54         proc sql _method;
 55             select a.*, b.*
 56             from data1 as a, data2 as b
 57             where a.name1= data2.name2
 58             ;

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjhsh
               sqxsrc( WORK.DATA1(alias = A) )
               sqxsrc( WORK.DATA2(alias = B) )
 59         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5469.21k
       OS Memory           32668.00k
       Timestamp           09/08/2015 06:43:09 PM
       Step Count                        457  Switch Count  50
       Page Faults                       0
       Page Reclaims                     87
       Page Swaps                        0
       Voluntary Context Switches        156
       Involuntary Context Switches      14
       Block Input Operations            0
       Block Output Operations           16


 60         /* Method 2: */
 61         
 62         proc sql _method;
 63             select a.* , b.*
 64             from data1 as a inner join data2 as b
 65                 on a.name1 = b.name2
 66             ;

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjhsh
               sqxsrc( WORK.DATA1(alias = A) )
               sqxsrc( WORK.DATA2(alias = B) )
 67         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5467.81k
       OS Memory           32924.00k
       Timestamp           09/08/2015 06:43:09 PM
       Step Count                        458  Switch Count  50
       Page Faults                       0
       Page Reclaims                     26
       Page Swaps                        0
       Voluntary Context Switches        167
       Involuntary Context Switches      11
       Block Input Operations            0
       Block Output Operations           8

您还可以通过_tree选项进行确认,该选项会生成更详细的查询计划版本。有关_method_tree选项输出的详细信息,请参阅here

如果您引导查询规划器使用不同的连接算法,则会出现一些差异:

 52         /* Method 1: */
 53         
 54         proc sql _method magic=101;
 55             select a.*, b.*
 56             from data1 as a, data2 as b
 57             where a.name1= data2.name2
 58             ;
 NOTE: PROC SQL planner chooses sequential loop join.

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjsl
               sqxsrc( WORK.DATA1(alias = A) )
               sqxsrc( WORK.DATA2(alias = B) )
 59         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.01 seconds
       memory              5468.53k
       OS Memory           32668.00k
       Timestamp           09/08/2015 06:41:54 PM
       Step Count                        451  Switch Count  52
       Page Faults                       0
       Page Reclaims                     101
       Page Swaps                        0
       Voluntary Context Switches        182
       Involuntary Context Switches      14
       Block Input Operations            0
       Block Output Operations           8


 60         /* Method 2: */
 61         
 62         proc sql _method magic=102;
 63             select a.* , b.*
 64             from data1 as a inner join data2 as b
 65                 on a.name1 = b.name2
 66             ;
 NOTE: PROC SQL planner chooses merge join.

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjm
               sqxsort
                   sqxsrc( WORK.DATA1(alias = A) )
               sqxsort
                   sqxsrc( WORK.DATA2(alias = B) )
 67         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5467.12k
       OS Memory           32924.00k
       Timestamp           09/08/2015 06:41:54 PM
       Step Count                        452  Switch Count  60
       Page Faults                       0
       Page Reclaims                     69
       Page Swaps                        0
       Voluntary Context Switches        197
       Involuntary Context Switches      13
       Block Input Operations            0
       Block Output Operations           16

有关magic=选项的详细信息,请参阅here。我不建议在任何类型的生产环境中使用它,但它有时可以用于此类事情。

鉴于文件的CPU时间差别很小,即使强迫SAS使用不同的合并方法,我也非常怀疑其他一些因素导致了这种情况。可能是神秘的html文件。