重塑数据集r

时间:2012-11-01 16:49:11

标签: r reshape

我需要从长时间转变为宽幅。我没有时间变量。为后续数据集中的每个id创建时间变量以便在基础r(而不是重塑包)中进行后续重构的最简单方法是什么?

h<-seq(from=as.Date("2005-06-01"), to=as.Date("2008-06-30"), by=1)

a<-data.frame(id=sample(c(1:100),300,replace=T),val=rnorm(n=300),date=sample(h,300,replace=T))

//中号

1 个答案:

答案 0 :(得分:4)

这是一种可能的方法。它使用ave根据“id”出现的次数创建“时间”变量,对我而言,听起来就像你正在寻找的那样。

您的数据,但有序(并使用set.seed以便其他人可以重现它):

set.seed(1)
h <- seq(from=as.Date("2005-06-01"), 
         to=as.Date("2008-06-30"), by=1)
a <- data.frame(id=sample(c(1:100), 300, replace=TRUE),
                val=rnorm(n=300), 
                date=sample(h, 300, replace=TRUE))
rm(h)
a <- a[order(a$id, a$date), ]
rbind(head(a), tail(a))
#      id         val       date
# 27    2  0.78763961 2007-08-25
# 116   2  0.27005490 2008-03-05
# 281   3 -2.03328560 2006-08-08
# 47    3  1.44115771 2007-06-25
# 133   4  1.32425863 2006-06-14
# 228   5 -0.14587563 2006-10-15
# 111  98  0.95101281 2008-04-29
# 293  99 -0.01825971 2006-01-20
# 139  99  0.43370215 2008-02-20
# 121 100 -0.25893258 2005-06-07
# 18  100 -1.42449465 2007-08-19
# 104 100 -0.24766434 2008-05-11

您最终将使用table进行8次“检查”。

max(table(a$id))
# [1] 8
a$time <- ave(a$id, a$id, FUN=seq_along)
rbind(head(a), tail(a))
#      id         val       date time
# 27    2  0.78763961 2007-08-25    1
# 116   2  0.27005490 2008-03-05    2
# 281   3 -2.03328560 2006-08-08    1
# 47    3  1.44115771 2007-06-25    2
# 133   4  1.32425863 2006-06-14    1
# 228   5 -0.14587563 2006-10-15    1
# 111  98  0.95101281 2008-04-29    1
# 293  99 -0.01825971 2006-01-20    1
# 139  99  0.43370215 2008-02-20    2
# 121 100 -0.25893258 2005-06-07    1
# 18  100 -1.42449465 2007-08-19    2
# 104 100 -0.24766434 2008-05-11    3
a.wide <- reshape(a, direction = "wide", idvar="id", timevar="time")
a.wide[1:8, 1:8]
#     id      val.1     date.1      val.2     date.2     val.3     date.3    val.4
# 27   2  0.7876396 2007-08-25  0.2700549 2008-03-05        NA       <NA>       NA
# 281  3 -2.0332856 2006-08-08  1.4411577 2007-06-25        NA       <NA>       NA
# 133  4  1.3242586 2006-06-14         NA       <NA>        NA       <NA>       NA
# 228  5 -0.1458756 2006-10-15  0.5929847 2008-03-31        NA       <NA>       NA
# 299  6  0.2368037 2006-02-06  1.0341077 2006-10-07        NA       <NA>       NA
# 10   7  1.8692906 2005-07-19 -0.4839749 2006-06-02  1.435070 2007-11-30 1.017754
# 158  8  0.5672209 2006-08-28 -0.4075286 2006-11-11 -2.285236 2007-03-29       NA
# 69   9  0.6422413 2008-06-20         NA       <NA>        NA       <NA>       NA
names(a.wide)
#  [1] "id"     "val.1"  "date.1" "val.2"  "date.2" "val.3"  "date.3" "val.4"  
#  [9] "date.4" "val.5"  "date.5" "val.6"  "date.6" "val.7"  "date.7" "val.8" 
# [17] "date.8"