将汇总的数据帧从长到宽转换(不使用reshape,reshape2,tydr)

时间:2019-04-04 07:02:11

标签: r dataframe reshape

简介: 我在无法安装任何软件包的严格环境中工作。我可以从{stats}访问dcast(),xtabs(),reshape()。我无权使用tydr,reshape或reshape2软件包。

现在出现问题: 我有一个汇总的数据框,其中包含4列名称:cust_id,merchant_group,sum和max,如下所示:

       cust_id merchant_group          sum   max
         <int> <chr>                  <dbl> <dbl>
 1         495 AIRLINE               45493 4950 
 2         495 AUTO RENTAL            3104 1000 
 3         495 CLOTHING STORES       20928 3140 
 4         495 DEPARTMENT STORES      1082  495
 5         495 DRUG STORES             482  165

我想将其重塑成如下所示的宽幅形式:

cust_id AIRLINE AUTO RENTAL CLOTHING STORES DEPARTMENT  STORES DRUG STORES
  495   45493   3104        20928           1082               482  
  495   4950    1000        3140            495                165

我尝试过以下功能:

xtabs(sum~cust_id+merchant_group, data=my.data)

reshape(my.data, idvar = "cust_id", timevar = "merchant_group", direction = "wide")

但是不能解决我的问题。预先感谢您的宝贵时间。

1 个答案:

答案 0 :(得分:0)

如果必须使用stats::reshape(),可以

(1)将数据整形为更长的格式,其中summax都位于一列中:

my.data.longer <- stats::reshape(data = my.data,
                                 idvar = 1:2,
                                 v.names = "value",
                                 timevar = "variable",
                                 times = c("sum", "max"),
                                 varying = 3:4,
                                 direction = "long")

看起来像这样(暂时不用担心row.names):

                          cust_id    merchant_group variable value
495.AIRLINE.sum               495           AIRLINE      sum 45493
495.AUTO RENTAL.sum           495       AUTO RENTAL      sum  3104
495.CLOTHING STORES.sum       495   CLOTHING STORES      sum 20928
495.DEPARTMENT STORES.sum     495 DEPARTMENT STORES      sum  1082
495.DRUG STORES.sum           495       DRUG STORES      sum   482
495.AIRLINE.max               495           AIRLINE      max  4950
495.AUTO RENTAL.max           495       AUTO RENTAL      max  1000
495.CLOTHING STORES.max       495   CLOTHING STORES      max  3140
495.DEPARTMENT STORES.max     495 DEPARTMENT STORES      max   495
495.DRUG STORES.max           495       DRUG STORES      max   165

(2)将较长的数据重塑为所需的宽格式:

my.data.wide <- stats::reshape(data = my.data.longer,
                               idvar = c("cust_id", "variable"),
                               timevar = "merchant_group",
                               times = as.character(my.data$merchant_group),
                               v.names = "value",
                               direction = "wide")

看起来像这样:

                cust_id variable value.AIRLINE value.AUTO RENTAL value.CLOTHING STORES value.DEPARTMENT STORES value.DRUG STORES
495.AIRLINE.sum     495      sum         45493              3104                 20928                    1082               482
495.AIRLINE.max     495      max          4950              1000                  3140                     495               165

(3)删除variable列,更改列names并重置row.names

my.data.wide$variable <- NULL
names(my.data.wide)[2:ncol(my.data.wide)] <- as.character(my.data$merchant_group)
row.names(my.data.wide) <- NULL
my.data.wide

结果是:

  cust_id AIRLINE AUTO RENTAL CLOTHING STORES DEPARTMENT STORES DRUG STORES
1     495   45493        3104           20928              1082         482
2     495    4950        1000            3140               495         165