我有以下数据框
id datestamp hrofday val1 val2 val3
a 20120401 0 3.2 0 1
a 20120401 1 3.3 4 0
a 20120401 2 3.4 6 0
...
a 20120401 23 7.3 0 2
它代表一个用户id,后跟一天中的小时,val1 val2& VAL3。 我想使用强制转换使用重塑或基础R将数据框放在以下形状中。对于每个id,我想要在24个字段的每个字段中的val1中的值,同时找到val2&的总数。 VAL3。例如如下所示
id datestamp val1.0 val1.1 val1.2 ... val1.23 total.val2 total.val3
a 20120401 3.2 3.3 3.4 ... 7.3 10 3
...
任何帮助非常感谢。提前谢谢。
答案 0 :(得分:3)
不完全是您想要的(略有不同的列名):
vecs <- by(dfrm, dfrm$id, FUN= function(dfr)
return(data.frame(id = dfr$id[1], val1=matrix(dfr$val1, ncol=24),
total.val2=sum(dfr$val2), total.val3=sum(dfr$val3) ) )
)
dfvecs <- do.call(rbind, vecs) # Not really needed for single group
# .. but it is the standard way to recombine by() results
> dfvecs
id val1.1 val1.2 val1.3 val1.4 val1.5 val1.6 val1.7 val1.8 val1.9 val1.10 val1.11
a a 6.94 5.45 2.83 9.23 2.92 8.37 2.86 2.67 1.87 2.32 3.17
val1.12 val1.13 val1.14 val1.15 val1.16 val1.17 val1.18 val1.19 val1.20 val1.21
a 3.03 1.59 0.4 2.19 8.11 5.26 9.15 8.31 0.46 4.56
val1.22 val1.23 val1.24 total.val2 total.val3
a 2.65 3.05 5.07 118.44 99.02
答案 1 :(得分:1)
我将其分为两部分:val1
,然后总结其余部分。
val1
:
require(reshape2)
d <- read.table(file='clipboard', header=TRUE)
# id hrofday val1 val2 val3
# 1 a 0 3.2 0 1
# 2 a 1 3.3 4 0
# 3 a 2 3.4 6 0
d.m <- melt(d,id.vars = 1:2)
d.val1 <- dcast(d.m,id + variable ~ hrofday)
# id variable 0 1 2
# 1 a val1 3.2 3.3 3.4
# 2 a val2 0.0 4.0 6.0
# 3 a val3 1.0 0.0 0.0
d.val1.format <- d.val1[d.val1$variable == "val1",-2]
# id 0 1 2
# 1 a 3.2 3.3 3.4
如果对于val1总是有固定数量的观察(对于一天中的每个小时),您也可以执行此操作(from this answer):
aggregate(val1 ~ id, d, c)
#id val1.1 val1.2 val1.3
#1 a 3.2 3.3 3.4
总结其他变量:
d.others <- aggregate(d[,-(1:3)],by=list(d$id),FUN=sum)
# Group.1 val2 val3
# 1 a 10 1
然后合并:
d.new <- merge(d.val1.format,d.others,by.x="id",by.y="Group.1")
# id 0 1 2 val2 val3
# 1 a 3.2 3.3 3.4 10 1
colnames(d.new) <- gsub("^(\\d+)$","val1.\\1",colnames(d.new))
# id val1.0 val1.1 val1.2 val2 val3
# 1 a 3.2 3.3 3.4 10 1
答案 2 :(得分:0)
您可以在坚持使用基础R的reshape()
,aggregate()
和merge()
函数时执行此操作。
这是一个最小的例子:
首先,一些示例数据:
set.seed(1) # So you can get the same results that I do
myDF <- data.frame(id = rep(c("a", "b", "c"), each = 48),
datestamp = rep(c("20120101", "20120102"), each = 24),
hrofday = rep(0:23, times = 6),
val1 = runif(144, min = 0, max = 10),
val2 = runif(144, min = 5, max = 15),
val3 = runif(144, min = 0, max = 5))
list(head(myDF), tail(myDF))
# [[1]]
# id datestamp hrofday val1 val2 val3
# 1 a 20120101 0 2.655087 12.293096 0.6611409
# 2 a 20120101 1 3.721239 9.525708 1.1065296
# 3 a 20120101 2 5.728534 6.751268 1.1319040
# 4 a 20120101 3 9.082078 12.466983 0.6570827
# 5 a 20120101 4 2.016819 6.049876 4.9078173
# 6 a 20120101 5 8.983897 13.645449 1.6350686
#
# [[2]]
# id datestamp hrofday val1 val2 val3
# 139 c 20120102 18 9.850952 13.803191 0.4265550
# 140 c 20120102 19 5.076418 8.730634 4.6628596
# 141 c 20120102 20 6.827881 5.479591 4.1919203
# 142 c 20120102 21 6.015412 6.386282 4.3971665
# 143 c 20120102 22 2.388687 8.214921 4.6785623
# 144 c 20120102 23 2.581659 6.548316 0.3623032
#
其次,创建要合并的对象:
## Use `aggregate` to get the totals for `val2` and `val3`. I used the
## `list` structure to be able to define my desired column names
myAggregates <- aggregate(list(total.val2 = myDF$val2, total.val3 = myDF$val3),
list(id = myDF$id, datestamp = myDF$datestamp),
sum, na.rm = TRUE)
myAggregates
# id datestamp total.val2 total.val3
# 1 a 20120101 229.0276 46.44113
# 2 b 20120101 234.9122 61.15198
# 3 c 20120101 238.5162 61.95309
# 4 a 20120102 269.6523 70.49336
# 5 b 20120102 238.5868 61.07377
# 6 c 20120102 198.4762 67.97553
## Use `reshape()` to change from long to wide. Drop `val2` and `val3`
## before reshaping (can be done many ways, I did it here by name matching)
myDFwide <- reshape(myDF[!names(myDF) %in% c("val2", "val3")], direction="wide",
idvar=c("id", "datestamp"), timevar="hrofday")
第三,使用merge()
组合这两个data.frame
。我已经发布了str()
的输出,因此您可以看到变量名称及其包含的内容类型。
myDF2 <- merge(myDFwide, myAggregates)
str(myDF2)
# 'data.frame': 6 obs. of 28 variables:
# $ id : Factor w/ 3 levels "a","b","c": 1 1 2 2 3 3
# $ datestamp : Factor w/ 2 levels "20120101","20120102": 1 2 1 2 1 2
# $ val1.0 : num 2.66 2.67 7.32 3.47 4.55 ...
# $ val1.1 : num 3.72 3.86 6.93 3.34 4.1 ...
# $ val1.2 : num 5.729 0.134 4.776 4.764 8.109 ...
# $ val1.3 : num 9.08 3.82 8.61 8.92 6.05 ...
# $ val1.4 : num 2.02 8.7 4.38 8.64 6.55 ...
# $ val1.5 : num 8.98 3.4 2.45 3.9 3.53 ...
# $ val1.6 : num 9.447 4.821 0.707 7.773 2.703 ...
# $ val1.7 : num 6.608 5.996 0.995 9.606 9.927 ...
# $ val1.8 : num 6.29 4.94 3.16 4.35 6.33 ...
# $ val1.9 : num 0.618 1.862 5.186 7.125 2.132 ...
# $ val1.10 : num 2.06 8.27 6.62 4 1.29 ...
# $ val1.11 : num 1.77 6.68 4.07 3.25 4.78 ...
# $ val1.12 : num 6.87 7.94 9.13 7.57 9.24 ...
# $ val1.13 : num 3.84 1.08 2.94 2.03 5.99 ...
# $ val1.14 : num 7.7 7.24 4.59 7.11 9.76 ...
# $ val1.15 : num 4.98 4.11 3.32 1.22 7.32 ...
# $ val1.16 : num 7.18 8.21 6.51 2.45 3.57 ...
# $ val1.17 : num 9.92 6.47 2.58 1.43 4.31 ...
# $ val1.18 : num 3.8 7.83 4.79 2.4 1.48 ...
# $ val1.19 : num 7.774 5.53 7.663 0.589 0.131 ...
# $ val1.20 : num 9.347 5.297 0.842 6.423 7.156 ...
# $ val1.21 : num 2.12 7.89 8.75 8.76 1.03 ...
# $ val1.22 : num 6.517 0.233 3.391 7.789 4.463 ...
# $ val1.23 : num 1.26 4.77 8.39 7.97 6.4 ...
# $ total.val2: num 229 270 235 239 239 ...
# $ total.val3: num 46.4 70.5 61.2 61.1 62 ...