按组将数据帧转换为部分“宽”和部分摘要

时间:2012-12-08 17:41:50

标签: r plyr reshape reshape2

我有以下数据框

id datestamp hrofday val1 val2 val3
a  20120401 0 3.2 0 1
a  20120401 1 3.3 4 0
a  20120401 2 3.4 6 0
...
a  20120401 23 7.3 0 2

它代表一个用户id,后跟一天中的小时,val1 val2& VAL3。 我想使用强制转换使用重塑或基础R将数据框放在以下形状中。对于每个id,我想要在24个字段的每个字段中的val1中的值,同时找到val2&的总数。 VAL3。例如如下所示

id datestamp val1.0 val1.1 val1.2 ... val1.23 total.val2 total.val3
a  20120401 3.2 3.3 3.4 ... 7.3 10 3
...

任何帮助非常感谢。提前谢谢。

3 个答案:

答案 0 :(得分:3)

不完全是您想要的(略有不同的列名):

vecs <-  by(dfrm, dfrm$id, FUN= function(dfr) 
return(data.frame(id = dfr$id[1], val1=matrix(dfr$val1, ncol=24), 
                   total.val2=sum(dfr$val2), total.val3=sum(dfr$val3) ) )
         )  

dfvecs <- do.call(rbind, vecs) # Not really needed for single group 
                       # .. but it is the standard way to recombine by() results

> dfvecs
  id val1.1 val1.2 val1.3 val1.4 val1.5 val1.6 val1.7 val1.8 val1.9 val1.10 val1.11
a  a   6.94   5.45   2.83   9.23   2.92   8.37   2.86   2.67   1.87    2.32    3.17
  val1.12 val1.13 val1.14 val1.15 val1.16 val1.17 val1.18 val1.19 val1.20 val1.21
a    3.03    1.59     0.4    2.19    8.11    5.26    9.15    8.31    0.46    4.56
  val1.22 val1.23 val1.24 total.val2 total.val3
a    2.65    3.05    5.07     118.44      99.02

答案 1 :(得分:1)

我将其分为两部分:val1,然后总结其余部分。

val1

require(reshape2)
d <- read.table(file='clipboard', header=TRUE)
#   id hrofday val1 val2 val3
# 1  a       0  3.2    0    1
# 2  a       1  3.3    4    0
# 3  a       2  3.4    6    0
d.m <- melt(d,id.vars = 1:2)
d.val1 <- dcast(d.m,id + variable ~ hrofday)
  # id variable   0   1   2
# 1  a     val1 3.2 3.3 3.4
# 2  a     val2 0.0 4.0 6.0
# 3  a     val3 1.0 0.0 0.0
d.val1.format <- d.val1[d.val1$variable == "val1",-2]
  # id   0   1   2
# 1  a 3.2 3.3 3.4

如果对于val1总是有固定数量的观察(对于一天中的每个小时),您也可以执行此操作(from this answer)

aggregate(val1 ~ id, d, c)
  #id val1.1 val1.2 val1.3
#1  a    3.2    3.3    3.4

总结其他变量:

d.others <- aggregate(d[,-(1:3)],by=list(d$id),FUN=sum)
  # Group.1 val2 val3
# 1       a   10    1

然后合并:

d.new <- merge(d.val1.format,d.others,by.x="id",by.y="Group.1")
  # id   0   1   2 val2 val3
# 1  a 3.2 3.3 3.4   10    1
colnames(d.new) <- gsub("^(\\d+)$","val1.\\1",colnames(d.new))
  # id val1.0 val1.1 val1.2 val2 val3
# 1  a    3.2    3.3    3.4   10    1

答案 2 :(得分:0)

您可以在坚持使用基础R的reshape()aggregate()merge()函数时执行此操作。

这是一个最小的例子:

首先,一些示例数据:

set.seed(1) # So you can get the same results that I do
myDF <- data.frame(id = rep(c("a", "b", "c"), each = 48),
                   datestamp = rep(c("20120101", "20120102"), each = 24),
                   hrofday = rep(0:23, times = 6),
                   val1 = runif(144, min = 0, max = 10),
                   val2 = runif(144, min = 5, max = 15),
                   val3 = runif(144, min = 0, max = 5))
list(head(myDF), tail(myDF))
# [[1]]
#   id datestamp hrofday     val1      val2      val3
# 1  a  20120101       0 2.655087 12.293096 0.6611409
# 2  a  20120101       1 3.721239  9.525708 1.1065296
# 3  a  20120101       2 5.728534  6.751268 1.1319040
# 4  a  20120101       3 9.082078 12.466983 0.6570827
# 5  a  20120101       4 2.016819  6.049876 4.9078173
# 6  a  20120101       5 8.983897 13.645449 1.6350686
# 
# [[2]]
#     id datestamp hrofday     val1      val2      val3
# 139  c  20120102      18 9.850952 13.803191 0.4265550
# 140  c  20120102      19 5.076418  8.730634 4.6628596
# 141  c  20120102      20 6.827881  5.479591 4.1919203
# 142  c  20120102      21 6.015412  6.386282 4.3971665
# 143  c  20120102      22 2.388687  8.214921 4.6785623
# 144  c  20120102      23 2.581659  6.548316 0.3623032
#

其次,创建要合并的对象:

## Use `aggregate` to get the totals for `val2` and `val3`. I used the 
##   `list` structure to be able to define my desired column names
myAggregates <- aggregate(list(total.val2 = myDF$val2, total.val3 = myDF$val3), 
                          list(id = myDF$id, datestamp = myDF$datestamp), 
                          sum, na.rm = TRUE)
myAggregates
#   id datestamp total.val2 total.val3
# 1  a  20120101   229.0276   46.44113
# 2  b  20120101   234.9122   61.15198
# 3  c  20120101   238.5162   61.95309
# 4  a  20120102   269.6523   70.49336
# 5  b  20120102   238.5868   61.07377
# 6  c  20120102   198.4762   67.97553

## Use `reshape()` to change from long to wide. Drop `val2` and `val3`
##   before reshaping (can be done many ways, I did it here by name matching)
myDFwide <- reshape(myDF[!names(myDF) %in% c("val2", "val3")], direction="wide", 
                    idvar=c("id", "datestamp"), timevar="hrofday")

第三,使用merge()组合这两个data.frame。我已经发布了str()的输出,因此您可以看到变量名称及其包含的内容类型。

myDF2 <- merge(myDFwide, myAggregates)
str(myDF2)
# 'data.frame':    6 obs. of  28 variables:
#  $ id        : Factor w/ 3 levels "a","b","c": 1 1 2 2 3 3
#  $ datestamp : Factor w/ 2 levels "20120101","20120102": 1 2 1 2 1 2
#  $ val1.0    : num  2.66 2.67 7.32 3.47 4.55 ...
#  $ val1.1    : num  3.72 3.86 6.93 3.34 4.1 ...
#  $ val1.2    : num  5.729 0.134 4.776 4.764 8.109 ...
#  $ val1.3    : num  9.08 3.82 8.61 8.92 6.05 ...
#  $ val1.4    : num  2.02 8.7 4.38 8.64 6.55 ...
#  $ val1.5    : num  8.98 3.4 2.45 3.9 3.53 ...
#  $ val1.6    : num  9.447 4.821 0.707 7.773 2.703 ...
#  $ val1.7    : num  6.608 5.996 0.995 9.606 9.927 ...
#  $ val1.8    : num  6.29 4.94 3.16 4.35 6.33 ...
#  $ val1.9    : num  0.618 1.862 5.186 7.125 2.132 ...
#  $ val1.10   : num  2.06 8.27 6.62 4 1.29 ...
#  $ val1.11   : num  1.77 6.68 4.07 3.25 4.78 ...
#  $ val1.12   : num  6.87 7.94 9.13 7.57 9.24 ...
#  $ val1.13   : num  3.84 1.08 2.94 2.03 5.99 ...
#  $ val1.14   : num  7.7 7.24 4.59 7.11 9.76 ...
#  $ val1.15   : num  4.98 4.11 3.32 1.22 7.32 ...
#  $ val1.16   : num  7.18 8.21 6.51 2.45 3.57 ...
#  $ val1.17   : num  9.92 6.47 2.58 1.43 4.31 ...
#  $ val1.18   : num  3.8 7.83 4.79 2.4 1.48 ...
#  $ val1.19   : num  7.774 5.53 7.663 0.589 0.131 ...
#  $ val1.20   : num  9.347 5.297 0.842 6.423 7.156 ...
#  $ val1.21   : num  2.12 7.89 8.75 8.76 1.03 ...
#  $ val1.22   : num  6.517 0.233 3.391 7.789 4.463 ...
#  $ val1.23   : num  1.26 4.77 8.39 7.97 6.4 ...
#  $ total.val2: num  229 270 235 239 239 ...
#  $ total.val3: num  46.4 70.5 61.2 61.1 62 ...