我希望按照客户唯一的ID级别进行汇总,每次观察都会再次转换,如下所示 以下是我的数据的快照
basedata <- structure(list(customer = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L), .Label = c("a", "b", "d"), class = "factor"), obs = c(12L,
11L, 12L, 10L, 3L, 5L, 7L, 8L, 1L)), .Names = c("customer", "obs"
), class = "data.frame", row.names = c(NA, -9L))
或者
customer obs
a 12
a 11
a 12
a 10
b 3
b 5
b 7
d 8
d 1
我想以下列形式转换它
customer obs1 obs2 obs3 obs4
a 12 11 12 10
b 3 5 7 -
d 8 1 - -
我使用了以下代码
basedata$shopping <- unlist(tapply(rawdata$customer, rawdata$customer,
function (x) seq(1, len = length(x))))
reshape(basedata, idvar = "customer", direction = "wide")
它出现以下错误
Error in `[.data.frame`(data, , timevar) : undefined columns selected
我怎样才能在R和excel中做到这一点? 谢谢
答案 0 :(得分:2)
x <- structure(list(customer = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L), .Label = c("a", "b", "d"), class = "factor"), obs = c(12L,
11L, 12L, 10L, 3L, 5L, 7L, 8L, 1L)), .Names = c("customer", "obs"
), class = "data.frame", row.names = c(NA, -9L))
我选择使用一些额外的软件包(plyr
和reshape2
),因为我发现它们比reshape
软件包中的base
更容易,更通用。
library(plyr)
library(reshape2)
## add observation number
x2 <- ddply(x,"customer",transform,num=1:length(customer))
## reshape
dcast(x2,customer~num,value.var="obs")
答案 1 :(得分:1)
基础R方式,假设dat
是数据,
> s <- split(dat$obs, dat$customer)
> df <- data.frame(do.call(rbind, lapply(s, function(x){ length(x) <- 4; x })))
> names(df) <- paste0('obs', seq(df))
> df
# obs1 obs2 obs3 obs4
# a 12 11 12 10
# b 3 5 7 NA
# d 8 1 NA NA
如果您希望唯一客户ID为列,
> df2 <- cbind(customer = rownames(df), df)
> rownames(df2) <- seq(nrow(df2))
> df2
# customer obs1 obs2 obs3 obs4
# 1 a 12 11 12 10
# 2 b 3 5 7 NA
# 3 d 8 1 NA NA
答案 2 :(得分:0)
我假设&#34; basedata&#34;和&#34; rawdata&#34;应该是相同的(或至少是彼此的副本)。如果是这种情况,您只是缺少指定timevar
的{{1}}参数应该是什么。
继续你离开的地方:
reshape
这是实际的重塑步骤:
rawdata$shopping <- unlist(tapply(rawdata$customer, rawdata$customer,
function (x) seq(1, len = length(x))))
## rawdata$shopping <- with(rawdata, ave(customer, customer, FUN = seq_along))