SAS Proc SQL非常慢

时间:2018-08-23 18:45:51

标签: sas

我有三个表:

  • 包含每个ID,年份和月份的profit的主表;
  • 包含每个ID,年份和月份的tax_a的表格;
  • 包含每个ID,年份和月份的tax_b的表。

profit已按ID,年和月进行累计,但未计税。我尝试使用下面的解决方案来做到这一点,它可以工作,但是非常慢。我如何才能更有效地解决这个问题?

proc sql;    
create table final_table as    
select t1.id, t1.year, t1.month, t1.profit,         
 (select sum(t2.tax_a) from work.table_tax_a t2     
  where ((t2.year = t1.year and t2.month <= t1.month) or (t2.year < t1.year)) and t2.id = t1.id) as tax_a,    
    (select sum(t3.tax_b) from work.table_tax_b t3      
     where ((t3.year = t1.year and t3.month <= t1.month) or (t3.year < t1.year)) and t3.id = t1.id) as tax_b     
from work.main_table t1;    
quit;

1 个答案:

答案 0 :(得分:2)

这很慢,因为您正在对main_table中的每一行运行2个求和。如果您可以将其从联接中拉出并放入临时表中,则可以使其运行更快。

您的内部查询只是为每个ID随时间创建累加税额。

select sum(t2.tax_a) 
   from work.table_tax_a t2     
   where ((t2.year = t1.year and t2.month <= t1.month) or (t2.year < t1.year)) 
     and t2.id = t1.id

(t2.year < t1.year)意味着您多年来一直在这样做。如果是您的原因,请在SQL外部计算累计总和,然后将结果重新加入。

假设您的表格已排序by id year month

data temp_a;
set table_tax_a;
by id;
retain c_tax_a;
if first.id then c_tax_a = 0;
c_tax_a = c_tax_a + tax_a;
run;

执行此操作,以table_tax_b创建temp_b。然后将它们加入SQL;

proc sql noprint;
create table final_table2 as 
select t1.id, t1.year, t1.month, t1.profit, t2.c_tax_a as tax_a, t3.c_tax_b as tax_b
    from main_table as t1,
         temp_a as t2,
         temp_b as t3
    where t1.id = t2.id
      and t2.id = t3.id
      and t1.month = t2.month
      and t2.month = t3.month
      and t1.year = t2.year
      and t2.year = t3.year;

quit;

一些测试数据显示与您的方法相同的结果。我的SQL步骤需要0.03秒,而您的SQL步骤需要0.65秒。