下面的更新示例代码 - 在SAS中使用SQL:
proc sql;
create table add_losses as
select *,
sum(bb.gross_loss) as gl format = comma15.2,
count(bb.gross_loss) as n_losses
from add_startend as aa
left join LED as bb
on (aa.process_name = bb.process_name and
aa.group_id = bb.group_code and
aa.start_date le bb.first_loss_posting_date le aa.end_date)
group by aa.process_name, aa.group_id, aa.start_date, aa.end_date
order by aa.process_name, aa.group_id, aa.start_date, aa.end_date;
quit;
以下示例数据和所需输出:
表AA
variable 1 variable 2 start date end date
AAAA BBB 1/1/2010 6/1/2010
表BB
variable 1 variable 2 Date losses
AAAA BBB 1/5/2010 100
AAAA BBB 2/1/2010 100
AAAA BBB 3/5/2010 100
AAAA BBB 4/23/2010 100
AAAA BBB 5/11/2010 100
AAAA BBB 5/25/2010 100
表YY(当前输出)
variable 1 variable 2 Date gross_loss gl n_losses
AAAA BBB 1/5/2010 100 600 6
AAAA BBB 2/1/2010 100 600 6
AAAA BBB 3/5/2010 100 600 6
AAAA BBB 4/23/2010 100 600 6
AAAA BBB 5/11/2010 100 600 6
AAAA BBB 5/25/2010 100 600 6
表XX(所需的输出)
variable 1 variable 2 start date end date gl n_losses
AAAA BBB 1/1/2010 6/1/2010 600 6
问题是当前代码会产生额外的观察结果。我想在表AA中保留相同的行数和所有变量,同时添加列' gl'和' n_losses'。
答案 0 :(得分:0)
我建议使用Google搜索SQL教程 - 特别是涵盖group by
语句的教程。了解group by
的工作原理后,查看SQL聚合函数count()
和sum()
。
编辑:
我建议进行以下更改:
创建示例数据:
data a;
informat start_date end_date mmddyy10.;
format start_date end_date yymmdd10.;
input variable_1 $
variable_2 $
start_date
end_date
;
datalines;
AAAA BBB 1/1/2010 6/1/2010
run;
data b;
informat date mmddyy10.;
format date yymmdd10.;
input variable_1 $
variable_2 $
date
losses
;
datalines;
AAAA BBB 1/5/2010 100
AAAA BBB 2/1/2010 100
AAAA BBB 3/5/2010 100
AAAA BBB 4/23/2010 100
AAAA BBB 5/11/2010 100
AAAA BBB 5/25/2010 100
run;
最终查询:
proc sql;
create table add_losses as
select a.variable_1,
a.variable_2,
a.start_date,
a.end_date,
count(b.variable_1) as n_losses,
sum(b.losses) as gl format=comma15.2
from a
left join b on a.variable_1 eq b.variable_1
and a.variable_2 eq b.variable_2
and b.date between a.start_date and a.end_date
group by 1,2,3,4
order by 1,2,3,4
;
quit;
注意我在group-by语句中使用了速记别名,因为它更容易编写/维护和理解。或者,您也可以明确列出如下列:
group by a.variable_1, a.variable_2, a.start_date, a.end_date
答案 1 :(得分:0)
如下所示的查询将给出您的结果(根据数据库技术,这可能略有不同):
select aa.variable1,
aa.variable2,
aa.start_date,
aa.end_date,
sum(bb.sales) as sales,
count(*) as n_sales
from table aa
join bb on (
aa.variable1 = bb.variable1 and
aa.variable2 = bb.variable2 and
aa.start_date >= bb.date and
aa.end_date < bb.date)
group by aa.variable1, aa.variable2, aa.start_date, aa.end_date
order by aa.variable1, aa.variable2, aa.start_date, aa.end_date
答案 2 :(得分:0)
您可以对此类条件使用Between子句
proc sql;
create table add_losses as
select aa.*, bb.gl format, bb.n_losses
from
add_startend as aa
left join
(
select aa.process_name, aa.group_id, aa.start_date, aa.end_date,
sum(bb.gross_loss) as gl format = comma15.2,
count(bb.gross_loss) as n_losses
from add_startend as aa
left join LED as bb
on (aa.process_name = bb.process_name and
aa.group_id = bb.group_code and
bb.first_loss_posting_date between aa.start_date and aa.end_date)
group by aa.process_name, aa.group_id, aa.start_date, aa.end_date
) bb
on aa.process_name = bb.process_name
and aa.group_id = bb.group_code
and aa.start_date = aa.start_date
and aa.end_date = bb.end_date
order by aa.process_name, aa.group_id, aa.start_date, aa.end_date;
quit;