数据与转置一起汇总

时间:2014-05-10 18:28:50

标签: r excel reshape

我希望按照客户唯一的ID级别进行汇总,每次观察都会再次转换,如下所示 以下是我的数据的快照

basedata <- structure(list(customer = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "d"), class = "factor"), obs = c(12L, 
11L, 12L, 10L, 3L, 5L, 7L, 8L, 1L)), .Names = c("customer", "obs"
), class = "data.frame", row.names = c(NA, -9L))

或者

customer   obs
a          12
a          11
a          12
a          10
b          3
b          5
b          7
d          8
d          1

我想以下列形式转换它

customer    obs1    obs2    obs3    obs4
a   12  11  12  10
b   3   5   7   -
d   8   1   -   -

我使用了以下代码

basedata$shopping <- unlist(tapply(rawdata$customer, rawdata$customer,
                        function (x) seq(1, len = length(x))))
reshape(basedata, idvar = "customer", direction = "wide")

它出现以下错误

Error in `[.data.frame`(data, , timevar) : undefined columns selected

我怎样才能在R和excel中做到这一点? 谢谢

3 个答案:

答案 0 :(得分:2)

x <- structure(list(customer = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "d"), class = "factor"), obs = c(12L, 
11L, 12L, 10L, 3L, 5L, 7L, 8L, 1L)), .Names = c("customer", "obs"
), class = "data.frame", row.names = c(NA, -9L))

我选择使用一些额外的软件包(plyrreshape2),因为我发现它们比reshape软件包中的base更容易,更通用。

library(plyr)
library(reshape2)
## add observation number
x2 <- ddply(x,"customer",transform,num=1:length(customer))
## reshape
dcast(x2,customer~num,value.var="obs")

答案 1 :(得分:1)

基础R方式,假设dat是数据,

> s <- split(dat$obs, dat$customer)
> df <- data.frame(do.call(rbind, lapply(s, function(x){ length(x) <- 4; x })))
> names(df) <- paste0('obs', seq(df))
> df
#   obs1 obs2 obs3 obs4
# a   12   11   12   10
# b    3    5    7   NA
# d    8    1   NA   NA

如果您希望唯一客户ID为列,

> df2 <- cbind(customer = rownames(df), df)
> rownames(df2) <- seq(nrow(df2))
> df2
#   customer obs1 obs2 obs3 obs4
# 1        a   12   11   12   10
# 2        b    3    5    7   NA
# 3        d    8    1   NA   NA

答案 2 :(得分:0)

假设&#34; basedata&#34;和&#34; rawdata&#34;应该是相同的(或至少是彼此的副本)。如果是这种情况,您只是缺少指定timevar的{​​{1}}参数应该是什么。

继续你离开的地方:

reshape

这是实际的重塑步骤:

rawdata$shopping <- unlist(tapply(rawdata$customer, rawdata$customer,
                                  function (x) seq(1, len = length(x))))
## rawdata$shopping <- with(rawdata, ave(customer, customer, FUN = seq_along))