Question

我是一个新的SAS / SQL用户，我有一个数据集，我需要将一些行转换为列。我认为有一种更快或更简单的方法可以做到这一点，我想向大家提供一些建议。我的例子将更好地解释我的问题：

这是我的数据集：

Month   ID     Car      Claim_Type   Cost_of_claim
  1    1243   Ferrari    Collision      12,000
  2    6437   Peugeot    Fire           50,000
  5    0184   Citroen    Stole           3,000
  9    1930   Fiat       Medical         1,000
  3    2934   GM         Liability      20,000

我需要创建一个类似的数据集：

Month   ID     Car    Collision   Fire    Stole   Medical Liability
1    1243   Ferrari    12,000       0       0       0         0 
2    6437   Peugeot       0      50,000     0       0         0         
5    0184   Citroen       0         0      3,000    0         0
9    1930   Fiat          0         0       0     1,000       0
3    2934   GM            0         0       0       0      20,000

我只是将一些行转换为列...

我正在考虑做类似的事情来创建我的新数据集：

proc sql;
select Month, ID, CAR
  case when Claim_Type = 'Collision' then Cost_of_claim end Collision,
  case when Claim_Type = 'Fire'      then Cost_of_claim end Fire,
  case when Claim_Type = 'Stole'     then Cost_of_claim end Stole,
  case when Claim_Type = 'Medical'   then Cost_of_claim end Medical,
  case when Claim_Type = 'Liability' then Cost_of_claim end Liability
from my_table;

问题在于拥有大量数据，我认为这种方式可能效率不高。此外，在我的数据集中，我有更多的列和行，并且不想在case when语句中键入所有可能性，因为维护代码似乎不容易（或用户友好）。

有人可以帮助我解决这个问题吗？

Answer 1

PROC TRANSPOSE应该做你想做的事。

data test;
  input Month   ID     Car $     Claim_Type : $12. Cost_of_claim;
  cards;
  1    1243   Ferrari    Collision      12000
  2    6437   Peugeot    Fire           50000
  5    0184   Citroen    Stole           3000
  9    1930   Fiat       Medical         1000
  3    2934   GM         Liability      20000
run;

proc transpose data=test out=transposed;
  by notsorted month notsorted id notsorted car;
  var cost_of_claim;
  id claim_type;
run;

输出数据集没有偏离对角线的零，但如果你真的想要它们，你可以在数据步骤中添加它们。

Answer 2

您可以尝试使用动态sql和pivot，但性能取决于您拥有多少种不同的声明类型。

create table #mytable (Month int, ID int, Car varchar(20), Claim_Type varchar(20),  Cost_of_claim int)

insert into #mytable values 
(1, 1243, 'Ferrari', 'Collision', 12000)
, (2, 6437, 'Peugeot', 'Fire', 50000)
, (5, 184, 'Citroen', 'Stole', 3000)
, (9, 1930, 'Fiat', 'Medical', 1000)
, (3, 2934, 'GM', 'Liability', 20000)
, (12, 4455, 'Ford', 'Theft', 20)


DECLARE @cols AS NVARCHAR(MAX),
    @query  AS NVARCHAR(MAX)

select @cols = STUFF((SELECT ',' + QUOTENAME(Claim_Type) 
                    from #mytable
                    group by Claim_Type
                    order by Claim_Type
            FOR XML PATH(''), TYPE
            ).value('.', 'NVARCHAR(MAX)') 
        ,1,1,'')

set @query = N'SELECT ' + 'month,id,car,' + @cols + N' from 
             (
                select month,id, car, Cost_of_claim, Claim_Type
                from #mytable               
            ) x
            pivot 
            (
                max(Cost_of_claim)
                for Claim_Type in (' + @cols + N')
            ) p 
            '

exec sp_executesql @query;

drop table #mytable

Answer 3

此方法使用所有可能的claim_types填充宏变量并循环遍历它们，以与示例代码相同的方式生成变量，因此您不需要输入所有可能的情况。＆＃34;支持＆＃34;因为循环中的逗号而使用变量（SAS将在proc sql步骤中的最后一个逗号后没有多一个变量时出错）。

data have;
   input Month ID Car $12. Claim_Type $12. Cost_of_claim;
   datalines;
  1    1243   Ferrari    Collision      12000
  2    6437   Peugeot    Fire           50000
  5    0184   Citroen    Stole           3000
  9    1930   Fiat       Medical         1000
  3    2934   GM         Liability      20000
    ;
run;


%macro your_macro;

    proc sql noprint;
        select distinct claim_type into: list_of_claims separated by " " from have;

        create table want (drop = backstop) as select
            month, id, car,
                %do i = 1 %to %sysfunc(countw(&list_of_claims.));
                %let this_claim = %scan(&list_of_claims., &i.);
                    case when claim_type = "&this_claim." then cost_of_claim else 0 end as &this_claim.,
                %end;
            1 as backstop
        from have;
    quit;

%mend your_macro;

%your_macro;

将行转换为SAS或SQL中的列

3 个答案: