使用proc聚合意味着SAS

时间:2018-03-14 12:02:54

标签: sas proc sas-studio

对于一个项目,我有一个1.5米条目的大型数据集,我希望通过一些约束变量来聚合一些汽车贷款数据,例如:

国家,货币,身份证明,固定或浮动,履行,初始贷款价值,汽车类型,汽车制造

我想知道是否可以通过将数字的初始贷款值相加来汇总数据,然后使用相同的观察将相似的变量压缩成一行,以便将第一个数据集转换为第二个

Country Currency ID Fixed_or_Floating Performing Initial_Value Current_Value 

data have;
set have;
input country $ currency $ ID Fixed $ performing $ initial current;
datalines;

UK       GBP     1   Fixed     Performing          100    50    
UK       GBP     1   Fixed     Performing          150    30   
UK       GBP     1   Fixed     Performing          160    70   
UK       GBP     1   Floating Performing          150    30   
UK       GBP     1   Floating Performing          115    80   
UK       GBP     1   Floating Performing          110    60   
UK       GBP     1   Fixed     Non-Performing   100    50 
UK       GBP     1   Fixed     Non-Performing   120    30  
;
run;

data want;
set have;
input country $ currency $ ID Fixed $ performing $ initial current;
datalines;

UK GBP 1 Fixed Performing 410 150
UK GBP 1 Floating Performing 275 170
UK GBP 1 Fixed Non-performing 220 80
;
run;

基本上寻找一种在连接字符变量时对数值求和的方法。

我已尝试过此代码

proc means data=have sum;
var initial current;
by country currency id fixed performing;
run;

不确定如果我必须使用proc sql(对于如此大的数据集来说太慢)或者可能是数据步骤。

任何关于连接的帮助都会受到赞赏。

3 个答案:

答案 0 :(得分:1)

Proc MEANS创建输出数据集并连接结果中的变量。带有BY语句的MEANS需要排序数据。您的have没有。

使用CATX函数可以将聚合键(那些可爱的分类变量)连接成一个空格分隔键(不确定为什么需要这样做)。

data have_unsorted;
length country $2 currency $3 id 8 type $8 evaluation $20 initial current 8;
input country currency ID type evaluation initial current;
datalines;
UK       GBP     1   Fixed     Performing          100    50    
UK       GBP     1   Fixed     Performing          150    30   
UK       GBP     1   Fixed     Performing          160    70   
UK       GBP     1   Floating Performing          150    30   
UK       GBP     1   Floating Performing          115    80   
UK       GBP     1   Floating Performing          110    60   
UK       GBP     1   Fixed     Non-Performing   100    50 
UK       GBP     1   Fixed     Non-Performing   120    30  
;
run;

方式1 - 使用CLASS / WAYS / OUTPUT,使用数据步骤后处理

类变量的基数可能会导致问题。

proc means data=have_unsorted noprint;
  class country currency ID type evaluation ;
  ways 5;
  output out=sums sum(initial current)= / autoname;
run;

data want;
  set sums;
  key = catx(' ',country,currency,ID,type,evaluation);
  keep key initial_sum current_sum;
run;

方式2 - SORT后跟MEANS BY BY / OUTPUT,后处理数据步骤

BY语句需要排序数据。

proc sort data=have_unsorted out=have;
  by country currency ID type evaluation ;

proc means data=have noprint;
  by country currency ID type evaluation ;
  output out=sums sum(initial current)= / autoname;
run;

data want;
  set sums;
  key = catx(' ',country,currency,ID,type,evaluation);
  keep key initial_sum current_sum;
run;

方式3 - MEANS,给定已分组但未排序的数据,使用BY NOTSORTED / OUTPUT,使用数据步骤后处理

have行将在BY个变量的 clumps 中处理。丛(clump)是一系列连续的行,它们具有相同的组。

proc means data=have_unsorted noprint;
  by country currency ID type evaluation NOTSORTED;
  output out=sums sum(initial current)= / autoname;
run;

data want;
  set sums;
  key = catx(' ',country,currency,ID,type,evaluation);
  keep key initial_sum current_sum;
run;

方式4 - 数据步骤,DOW循环,按NOTSORTED和键构造

have行将在BY个变量的 clumps 中处理。丛(clump)是一系列连续的行,它们具有相同的组。

data want_way4;
  do until (last.evaluation);
    set have;
    by country currency ID type evaluation NOTSORTED;
    initial_sum = SUM(initial_sum, initial);
    current_sum = SUM(current_sum, current);
  end;
  key = catx(' ',country,currency,ID,type,evaluation); 
  keep key initial_sum current_sum;
run;

方式5 - 数据步骤哈希

数据可以在预先处理或聚集的情况下处理。换句话说,数据可能完全混乱。

data _null_;
  length key $50 initial_sum current_sum 8;

  if _n_ = 1 then do;
    call missing (key, initial_sum, current_sum);

    declare hash sums();
    sums.defineKey('key');
    sums.defineData('key','initial_sum','current_sum');
    sums.defineDone();
  end;

  set have_unsorted end=end;
  key = catx(' ',country,currency,ID,type,evaluation); 

  rc = sums.find();
  initial_sum = SUM(initial_sum, initial);
  current_sum = SUM(current_sum, current);
  sums.replace();

  if end then
    sums.output(dataset:'have_way5');
run;

答案 1 :(得分:0)

1.5米的条目不是很大的数据集。首先对数据集进行排序。

proc sort data=have;
by country currency id fixed performing;
run;

proc means data=have sum;
var initial current;
by country currency id fixed performing;
output out=sum(drop=_:) sum(initial)=Initial sum(current)=Current;
run;

答案 2 :(得分:-1)

支持派勒米勒的道具

proc summary data=testa nway;
var  net_balance;
class ID fixed_or_floating performing_status initial country currency ;
output out=sumtest sum=sum_initial;
run;