简介: 我在无法安装任何软件包的严格环境中工作。我可以从{stats}访问dcast(),xtabs(),reshape()。我无权使用tydr,reshape或reshape2软件包。
现在出现问题: 我有一个汇总的数据框,其中包含4列名称:cust_id,merchant_group,sum和max,如下所示:
cust_id merchant_group sum max
<int> <chr> <dbl> <dbl>
1 495 AIRLINE 45493 4950
2 495 AUTO RENTAL 3104 1000
3 495 CLOTHING STORES 20928 3140
4 495 DEPARTMENT STORES 1082 495
5 495 DRUG STORES 482 165
我想将其重塑成如下所示的宽幅形式:
cust_id AIRLINE AUTO RENTAL CLOTHING STORES DEPARTMENT STORES DRUG STORES
495 45493 3104 20928 1082 482
495 4950 1000 3140 495 165
我尝试过以下功能:
xtabs(sum~cust_id+merchant_group, data=my.data)
reshape(my.data, idvar = "cust_id", timevar = "merchant_group", direction = "wide")
但是不能解决我的问题。预先感谢您的宝贵时间。
答案 0 :(得分:0)
如果必须使用stats::reshape()
,可以
(1)将数据整形为更长的格式,其中sum
和max
都位于一列中:
my.data.longer <- stats::reshape(data = my.data,
idvar = 1:2,
v.names = "value",
timevar = "variable",
times = c("sum", "max"),
varying = 3:4,
direction = "long")
看起来像这样(暂时不用担心row.names):
cust_id merchant_group variable value
495.AIRLINE.sum 495 AIRLINE sum 45493
495.AUTO RENTAL.sum 495 AUTO RENTAL sum 3104
495.CLOTHING STORES.sum 495 CLOTHING STORES sum 20928
495.DEPARTMENT STORES.sum 495 DEPARTMENT STORES sum 1082
495.DRUG STORES.sum 495 DRUG STORES sum 482
495.AIRLINE.max 495 AIRLINE max 4950
495.AUTO RENTAL.max 495 AUTO RENTAL max 1000
495.CLOTHING STORES.max 495 CLOTHING STORES max 3140
495.DEPARTMENT STORES.max 495 DEPARTMENT STORES max 495
495.DRUG STORES.max 495 DRUG STORES max 165
(2)将较长的数据重塑为所需的宽格式:
my.data.wide <- stats::reshape(data = my.data.longer,
idvar = c("cust_id", "variable"),
timevar = "merchant_group",
times = as.character(my.data$merchant_group),
v.names = "value",
direction = "wide")
看起来像这样:
cust_id variable value.AIRLINE value.AUTO RENTAL value.CLOTHING STORES value.DEPARTMENT STORES value.DRUG STORES
495.AIRLINE.sum 495 sum 45493 3104 20928 1082 482
495.AIRLINE.max 495 max 4950 1000 3140 495 165
(3)删除variable
列,更改列names
并重置row.names
:
my.data.wide$variable <- NULL
names(my.data.wide)[2:ncol(my.data.wide)] <- as.character(my.data$merchant_group)
row.names(my.data.wide) <- NULL
my.data.wide
结果是:
cust_id AIRLINE AUTO RENTAL CLOTHING STORES DEPARTMENT STORES DRUG STORES
1 495 45493 3104 20928 1082 482
2 495 4950 1000 3140 495 165