我正在尝试按日期和ID分割此数据框:
Id Date Returns
1 039229109 1996-12-31 0.4739285
2 039229109 1997-01-02 -1.8867910
3 039229109 1997-01-03 0.4807711
4 056180102 1996-12-31 -4.9504940
5 056180102 1997-01-02 2.6041627
6 056180102 1997-01-03 0.0000000
7 096650106 1996-12-31 -2.0872890
8 096650106 1997-01-02 -1.8410861
9 096650106 1997-01-03 1.4807463
所以它看起来像这样:
Date 039229109 056180102 096650106
1 1996-12-31 0.4739285 -4.950494 -2.087289
2 1997-01-02 -1.8867910 2.604163 -1.841086
3 1997-01-03 0.4807711 0.000000 1.480746
我尝试过使用:
> aggregate(data,by = list(data$Date),identity)
但是这会返回:
Group.1 Id.1 Id.2 Id.3 Date.1 Date.2 Date.3 Returns.1 Returns.2 Returns.3
1 1996-12-31 039229109 056180102 096650106 9861 9861 9861 0.4739285 -4.9504940 -2.0872890
2 1997-01-02 039229109 056180102 096650106 9863 9863 9863 -1.8867910 2.6041627 -1.8410861
3 1997-01-03 039229109 056180102 096650106 9864 9864 9864 0.4807711 0.0000000 1.4807463
我对聚合非常不熟悉,觉得这应该很简单,但我无法想办法做到这一点。 (我尝试使用重塑,但不理解它,并且无法获得有意义的结果。)
感谢您的帮助!
编辑:更改并格式化数据。
答案 0 :(得分:1)
这实际上是一个重塑问题而不是聚合问题。这可能是您在使用aggregate
时遇到困难的原因。所以如果这是你的样本数据
data<-structure(list(Id = c("039229109", "039229109", "039229109",
"056180102", "056180102", "056180102", "096650106", "096650106",
"096650106", "172736100", "172736100", "172736100", "208368100",
"208368100", "208368100"), Date = structure(c(9861, 9863, 9864,
9861, 9863, 9864, 9861, 9863, 9864, 9861, 9863, 9864, 9861, 9863,
9864), class = "Date"), fg.total.returnc = c(0.4739285, -1.886791,
0.4807711, -4.950494, 2.6041627, 0, -2.087289, -1.8410861, 1.4807463,
-0.8130074, 0.8196712, 0.8130074, -0.1253128, -0.6273508, 0.1262665
)), .Names = c("Id", "Date", "fg.total.returnc"), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15"), class = "data.frame")
然后你可以使用基函数reshape()
。例如
ww <- reshape(data, timevar="Id", idvar="Date", direction="wide")
names(ww) <- gsub("fg.total.returnc.","", names(ww), fixed=T)
ww
# Date 039229109 056180102 096650106 172736100 208368100
# 1 1996-12-31 0.4739285 -4.950494 -2.087289 -0.8130074 -0.1253128
# 2 1997-01-02 -1.8867910 2.604163 -1.841086 0.8196712 -0.6273508
# 3 1997-01-03 0.4807711 0.000000 1.480746 0.8130074 0.1262665
这有点尴尬,因为在您的示例中切换了ID和日期的角色。我认为更好的方法是使用reshape2
库。
library(reshape2)
dcast(data, Date~Id)
# Date 039229109 056180102 096650106 172736100 208368100
# 1 1996-12-31 0.4739285 -4.950494 -2.087289 -0.8130074 -0.1253128
# 2 1997-01-02 -1.8867910 2.604163 -1.841086 0.8196712 -0.6273508
# 3 1997-01-03 0.4807711 0.000000 1.480746 0.8130074 0.1262665
答案 1 :(得分:0)
基地R的一个不错的选择是使用xtabs
:
> xtabs(fg.total.returnc ~ Date + Id, data)
Id
Date 039229109 056180102 096650106 172736100 208368100
1996-12-31 0.4739285 -4.9504940 -2.0872890 -0.8130074 -0.1253128
1997-01-02 -1.8867910 2.6041627 -1.8410861 0.8196712 -0.6273508
1997-01-03 0.4807711 0.0000000 1.4807463 0.8130074 0.1262665
结果是matrix
class
个xtabs
和table
,所以如果你想要一个data.frame
,请确保将上面的内容包含在内as.data.frame.matrix
而不只是as.data.frame
(因为后者只会让你回到你开始的地方)。