我正在学习sas proc sql语句。我观察到虽然结果对于以下两种方法是相同的,但实际和cpu时间是不同的。我想知道为什么存在差异。
data data1;
input name1 $ choice $;
datalines;
John A
Mary B
Peter C
;
run;
data data2;
input name2 $ choice2 $;
datalines;
John B
Mary C
Peter B
run;
方法1:
proc sql;
select a.*, b.*
from data1 as a, data2 as b
where a.name1= data2.name2
;
quit;
方法2:
proc sql;
select a.* , b.*
from data1 as a inner join data2 as b
on a.name1 = b.name2
;
quit;
答案 0 :(得分:0)
为了便于讨论,忽略原因不明的html文件以及CPU和执行时间的任何随机波动,简短的答案可能就是SAS默认以不同的方式处理不同的连接。也许这对于像这里的例子一样小的文件并没有多大区别,但值得了解。
更长的答案是,这可能在某种程度上取决于您正在使用的SAS的确切版本。在带有示例数据集的SAS 9.4中,如果您将proc sql
留给自己的设备,我看到生成的查询计划对于两个联接都是相同的:
52 /* Method 1: */
53
54 proc sql _method;
55 select a.*, b.*
56 from data1 as a, data2 as b
57 where a.name1= data2.name2
58 ;
NOTE: SQL execution methods chosen are:
sqxslct
sqxjhsh
sqxsrc( WORK.DATA1(alias = A) )
sqxsrc( WORK.DATA2(alias = B) )
59 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5469.21k
OS Memory 32668.00k
Timestamp 09/08/2015 06:43:09 PM
Step Count 457 Switch Count 50
Page Faults 0
Page Reclaims 87
Page Swaps 0
Voluntary Context Switches 156
Involuntary Context Switches 14
Block Input Operations 0
Block Output Operations 16
60 /* Method 2: */
61
62 proc sql _method;
63 select a.* , b.*
64 from data1 as a inner join data2 as b
65 on a.name1 = b.name2
66 ;
NOTE: SQL execution methods chosen are:
sqxslct
sqxjhsh
sqxsrc( WORK.DATA1(alias = A) )
sqxsrc( WORK.DATA2(alias = B) )
67 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5467.81k
OS Memory 32924.00k
Timestamp 09/08/2015 06:43:09 PM
Step Count 458 Switch Count 50
Page Faults 0
Page Reclaims 26
Page Swaps 0
Voluntary Context Switches 167
Involuntary Context Switches 11
Block Input Operations 0
Block Output Operations 8
您还可以通过_tree
选项进行确认,该选项会生成更详细的查询计划版本。有关_method
和_tree
选项输出的详细信息,请参阅here。
如果您引导查询规划器使用不同的连接算法,则会出现一些差异:
52 /* Method 1: */
53
54 proc sql _method magic=101;
55 select a.*, b.*
56 from data1 as a, data2 as b
57 where a.name1= data2.name2
58 ;
NOTE: PROC SQL planner chooses sequential loop join.
NOTE: SQL execution methods chosen are:
sqxslct
sqxjsl
sqxsrc( WORK.DATA1(alias = A) )
sqxsrc( WORK.DATA2(alias = B) )
59 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 5468.53k
OS Memory 32668.00k
Timestamp 09/08/2015 06:41:54 PM
Step Count 451 Switch Count 52
Page Faults 0
Page Reclaims 101
Page Swaps 0
Voluntary Context Switches 182
Involuntary Context Switches 14
Block Input Operations 0
Block Output Operations 8
60 /* Method 2: */
61
62 proc sql _method magic=102;
63 select a.* , b.*
64 from data1 as a inner join data2 as b
65 on a.name1 = b.name2
66 ;
NOTE: PROC SQL planner chooses merge join.
NOTE: SQL execution methods chosen are:
sqxslct
sqxjm
sqxsort
sqxsrc( WORK.DATA1(alias = A) )
sqxsort
sqxsrc( WORK.DATA2(alias = B) )
67 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5467.12k
OS Memory 32924.00k
Timestamp 09/08/2015 06:41:54 PM
Step Count 452 Switch Count 60
Page Faults 0
Page Reclaims 69
Page Swaps 0
Voluntary Context Switches 197
Involuntary Context Switches 13
Block Input Operations 0
Block Output Operations 16
有关magic=
选项的详细信息,请参阅here。我不建议在任何类型的生产环境中使用它,但它有时可以用于此类事情。
鉴于文件的CPU时间差别很小,即使强迫SAS使用不同的合并方法,我也非常怀疑其他一些因素导致了这种情况。可能是神秘的html文件。