我在SAS中有一个表,例如customer_id和5个列,带有他的月度状态。客户有6种不同的状态。 例如
customer_id month1 month2 month3 month4 month5
12345678 Waiting Inactive Active Active Canceled
我希望从month1 - month5返回一个值,这是最常见的。在这种情况下,它是值Active。 结果将是
customer_id frequent
12345678 Active
SAS有什么功能吗?我有一些想法如何使用sql,但它会很复杂,有很多案例条件等。我是SAS的新手,所以我想会有更好的解决方案。
答案 0 :(得分:2)
如果使用数组将数据集拆分为客户历史记录中每个月的一个观察值,则可以使用proc sql中的汇总函数轻松获取最常见的事件并使用最近一个月(假设是第5个月)打破关系。
data want1;
set have;
array m(*) month1 -- month5;
do i = 1 to dim(m);
cid = customer_id;
frequent = m(i);
position = i;
output;
end;
keep cid frequent position;
run;
proc sql;
create table want2 as select
cid as customer_id,
frequent,
max(position) as max_pos,
count(frequent) as count
from want1
group by cid, frequent;
quit;
proc sort data = want2; by customer_id descending count descending max_pos; run;
data want3;
set want2;
by customer_id descending count descending max_pos;
if first.customer_id;
drop max_pos count;
run;
答案 1 :(得分:0)
有点差的解决方案,但它确实适用于2个不同的值,在这种情况下为5个月。如果有效数量> = 3,则这是最常见的值:
select customer_id, case when (case when month1 = 'Active' then 1 else 0 end +
case when month2 = 'Active' then 1 else 0 end +
case when month3 = 'Active' then 1 else 0 end +
case when month4 = 'Active' then 1 else 0 end +
case when month5 = 'Active' then 1 else 0 end) >= 3
then 'Active' else 'Waiting' end
from tablename
另一种方式,UNION ALL
:
select customer_id, month, count(*) as cnt
(
select customer_id, month1 as month from tablename
UNION ALL
select customer_id, month2 from tablename
UNION ALL
select customer_id, month3 from tablename
UNION ALL
select customer_id, month4 from tablename
UNION ALL
select customer_id, month5 from tablename
)
group by customer_id, month
order by cnt
fetch first 1 row only
FETCH FIRST
是ANSI SQL,对于某些dbms产品可能是TOP
或LIMIT
。