执行左连接时求和变量

时间:2014-12-29 18:17:38

标签: sql sas left-join

下面的更新示例代码 - 在SAS中使用SQL:

proc sql;
  create table add_losses as 
  select *, 
    sum(bb.gross_loss) as gl format = comma15.2, 
    count(bb.gross_loss) as n_losses
  from add_startend as aa   
  left join LED as bb 
    on (aa.process_name = bb.process_name and
        aa.group_id = bb.group_code and
        aa.start_date le bb.first_loss_posting_date le aa.end_date)

  group by aa.process_name, aa.group_id, aa.start_date, aa.end_date
  order by aa.process_name, aa.group_id, aa.start_date, aa.end_date;
quit;

以下示例数据和所需输出:

表AA

variable 1  variable 2  start date  end date
AAAA        BBB         1/1/2010    6/1/2010

表BB

variable 1  variable 2  Date      losses
AAAA        BBB         1/5/2010    100
AAAA        BBB         2/1/2010    100
AAAA        BBB         3/5/2010    100
AAAA        BBB         4/23/2010   100
AAAA        BBB         5/11/2010   100
AAAA        BBB         5/25/2010   100

表YY(当前输出)

variable 1  variable 2  Date    gross_loss  gl  n_losses
AAAA        BBB         1/5/2010    100     600 6
AAAA        BBB         2/1/2010    100     600 6
AAAA        BBB         3/5/2010    100     600 6
AAAA        BBB         4/23/2010   100     600 6
AAAA        BBB         5/11/2010   100     600 6
AAAA        BBB         5/25/2010   100     600 6

表XX(所需的输出)

variable 1  variable 2  start date  end date    gl  n_losses
AAAA        BBB         1/1/2010    6/1/2010    600     6

问题是当前代码会产生额外的观察结果。我想在表AA中保留相同的行数和所有变量,同时添加列' gl'和' n_losses'。

3 个答案:

答案 0 :(得分:0)

我建议使用Google搜索SQL教程 - 特别是涵盖group by语句的教程。了解group by的工作原理后,查看SQL聚合函数count()sum()

编辑:

我建议进行以下更改:

  • 使用group-by语句时,应明确列出要在SELECT语句中分组的变量(而不是仅使用*)。
  • 使用between运算符进行日期比较,因为它更容易理解
  • 始终确保select语句中的每一列都是聚合函数的一部分,或者在group by子句中。或者

创建示例数据:

data a;
  informat start_date end_date mmddyy10.;
  format start_date end_date yymmdd10.;
  input variable_1 $
        variable_2 $
        start_date 
        end_date
        ;
  datalines;
AAAA        BBB         1/1/2010    6/1/2010
run;

data b;
  informat date mmddyy10.;
  format date yymmdd10.;
  input variable_1 $
        variable_2 $
        date 
        losses
        ;
  datalines;
AAAA        BBB         1/5/2010    100
AAAA        BBB         2/1/2010    100
AAAA        BBB         3/5/2010    100
AAAA        BBB         4/23/2010   100
AAAA        BBB         5/11/2010   100
AAAA        BBB         5/25/2010   100
run;

最终查询:

proc sql;
  create table add_losses as 
  select a.variable_1, 
         a.variable_2,
         a.start_date,
         a.end_date,
         count(b.variable_1) as n_losses,
         sum(b.losses) as gl format=comma15.2
  from a
  left join b on a.variable_1 eq b.variable_1
             and a.variable_2 eq b.variable_2
             and b.date between a.start_date and a.end_date
  group by 1,2,3,4
  order by 1,2,3,4
  ;
quit;

注意我在group-by语句中使用了速记别名,因为它更容易编写/维护和理解。或者,您也可以明确列出如下列:

group by a.variable_1, a.variable_2, a.start_date, a.end_date

答案 1 :(得分:0)

如下所示的查询将给出您的结果(根据数据库技术,这可能略有不同):

select aa.variable1,
       aa.variable2,
       aa.start_date,
       aa.end_date,
       sum(bb.sales) as sales,
       count(*) as n_sales
  from table aa
          join bb on (
              aa.variable1 = bb.variable1 and
              aa.variable2 = bb.variable2 and
              aa.start_date >= bb.date and
              aa.end_date < bb.date)
 group by aa.variable1, aa.variable2, aa.start_date, aa.end_date
 order by aa.variable1, aa.variable2, aa.start_date, aa.end_date

答案 2 :(得分:0)

您可以对此类条件使用Between子句

proc sql;
create table add_losses as 
select aa.*, bb.gl format, bb.n_losses
from
add_startend as aa
left join
(
select aa.process_name, aa.group_id, aa.start_date, aa.end_date, 
sum(bb.gross_loss) as gl format = comma15.2, 
count(bb.gross_loss) as n_losses
from add_startend as aa     
left join LED as bb 

on (aa.process_name = bb.process_name and
aa.group_id = bb.group_code and
bb.first_loss_posting_date between aa.start_date and aa.end_date)

group by aa.process_name, aa.group_id, aa.start_date, aa.end_date
) bb
on  aa.process_name = bb.process_name 
and aa.group_id = bb.group_code
and aa.start_date = aa.start_date
and aa.end_date = bb.end_date
order by aa.process_name, aa.group_id, aa.start_date, aa.end_date;
quit;