对于一个项目,我有一个1.5米条目的大型数据集,我希望通过一些约束变量来聚合一些汽车贷款数据,例如:
国家,货币,身份证明,固定或浮动,履行,初始贷款价值,汽车类型,汽车制造
我想知道是否可以通过将数字的初始贷款值相加来汇总数据,然后使用相同的观察将相似的变量压缩成一行,以便将第一个数据集转换为第二个
Country Currency ID Fixed_or_Floating Performing Initial_Value Current_Value
data have;
set have;
input country $ currency $ ID Fixed $ performing $ initial current;
datalines;
UK GBP 1 Fixed Performing 100 50
UK GBP 1 Fixed Performing 150 30
UK GBP 1 Fixed Performing 160 70
UK GBP 1 Floating Performing 150 30
UK GBP 1 Floating Performing 115 80
UK GBP 1 Floating Performing 110 60
UK GBP 1 Fixed Non-Performing 100 50
UK GBP 1 Fixed Non-Performing 120 30
;
run;
data want;
set have;
input country $ currency $ ID Fixed $ performing $ initial current;
datalines;
UK GBP 1 Fixed Performing 410 150
UK GBP 1 Floating Performing 275 170
UK GBP 1 Fixed Non-performing 220 80
;
run;
基本上寻找一种在连接字符变量时对数值求和的方法。
我已尝试过此代码
proc means data=have sum;
var initial current;
by country currency id fixed performing;
run;
不确定如果我必须使用proc sql(对于如此大的数据集来说太慢)或者可能是数据步骤。
任何关于连接的帮助都会受到赞赏。
答案 0 :(得分:1)
从Proc MEANS
创建输出数据集并连接结果中的变量。带有BY语句的MEANS需要排序数据。您的have
没有。
使用CATX
函数可以将聚合键(那些可爱的分类变量)连接成一个空格分隔键(不确定为什么需要这样做)。
data have_unsorted;
length country $2 currency $3 id 8 type $8 evaluation $20 initial current 8;
input country currency ID type evaluation initial current;
datalines;
UK GBP 1 Fixed Performing 100 50
UK GBP 1 Fixed Performing 150 30
UK GBP 1 Fixed Performing 160 70
UK GBP 1 Floating Performing 150 30
UK GBP 1 Floating Performing 115 80
UK GBP 1 Floating Performing 110 60
UK GBP 1 Fixed Non-Performing 100 50
UK GBP 1 Fixed Non-Performing 120 30
;
run;
方式1 - 使用CLASS / WAYS / OUTPUT,使用数据步骤后处理
类变量的基数可能会导致问题。
proc means data=have_unsorted noprint;
class country currency ID type evaluation ;
ways 5;
output out=sums sum(initial current)= / autoname;
run;
data want;
set sums;
key = catx(' ',country,currency,ID,type,evaluation);
keep key initial_sum current_sum;
run;
方式2 - SORT后跟MEANS BY BY / OUTPUT,后处理数据步骤
BY语句需要排序数据。
proc sort data=have_unsorted out=have;
by country currency ID type evaluation ;
proc means data=have noprint;
by country currency ID type evaluation ;
output out=sums sum(initial current)= / autoname;
run;
data want;
set sums;
key = catx(' ',country,currency,ID,type,evaluation);
keep key initial_sum current_sum;
run;
方式3 - MEANS,给定已分组但未排序的数据,使用BY NOTSORTED / OUTPUT,使用数据步骤后处理
have
行将在BY
个变量的 clumps 中处理。丛(clump)是一系列连续的行,它们具有相同的组。
proc means data=have_unsorted noprint;
by country currency ID type evaluation NOTSORTED;
output out=sums sum(initial current)= / autoname;
run;
data want;
set sums;
key = catx(' ',country,currency,ID,type,evaluation);
keep key initial_sum current_sum;
run;
方式4 - 数据步骤,DOW循环,按NOTSORTED和键构造
have
行将在BY
个变量的 clumps 中处理。丛(clump)是一系列连续的行,它们具有相同的组。
data want_way4;
do until (last.evaluation);
set have;
by country currency ID type evaluation NOTSORTED;
initial_sum = SUM(initial_sum, initial);
current_sum = SUM(current_sum, current);
end;
key = catx(' ',country,currency,ID,type,evaluation);
keep key initial_sum current_sum;
run;
方式5 - 数据步骤哈希
数据可以在预先处理或聚集的情况下处理。换句话说,数据可能完全混乱。
data _null_;
length key $50 initial_sum current_sum 8;
if _n_ = 1 then do;
call missing (key, initial_sum, current_sum);
declare hash sums();
sums.defineKey('key');
sums.defineData('key','initial_sum','current_sum');
sums.defineDone();
end;
set have_unsorted end=end;
key = catx(' ',country,currency,ID,type,evaluation);
rc = sums.find();
initial_sum = SUM(initial_sum, initial);
current_sum = SUM(current_sum, current);
sums.replace();
if end then
sums.output(dataset:'have_way5');
run;
答案 1 :(得分:0)
1.5米的条目不是很大的数据集。首先对数据集进行排序。
proc sort data=have;
by country currency id fixed performing;
run;
proc means data=have sum;
var initial current;
by country currency id fixed performing;
output out=sum(drop=_:) sum(initial)=Initial sum(current)=Current;
run;
答案 2 :(得分:-1)
支持派勒米勒的道具
proc summary data=testa nway;
var net_balance;
class ID fixed_or_floating performing_status initial country currency ;
output out=sumtest sum=sum_initial;
run;