我有data.frame
看起来与此相似:
Y date value1 value2
a 2013-01-01 28.857326 9.0206351
a 2013-01-02 13.675526 5.7823725
a 2013-01-03 20.115434 9.3267285
a 2013-01-04 -4.255547 0.9174301
a 2013-01-05 20.898522 9.7821027
b 2013-01-01 5.478783 27.0027194
b 2013-01-02 21.195939 -14.8786857
b 2013-01-03 -4.407236 18.9189197
b 2013-01-04 25.910805 1.0627444
b 2013-01-05 -2.511209 39.0908554
我想计算以下(value1 * value2)/ sum(value1),其中sum(value1)应该只是每个日期的求和值,例如:第一行数据应计算为: (28.857326 * 9.0206351)/(28.857326 + 5.478783)。
我试过了两个:
ddply(x, .(date), summarize, freq=length(date), calc=(value1 * value2) / sum(value1))
和
ddply(x, .(date), summarize, calc=(value1 * value2) / sum(value1))
但是第一次通话时收到错误,第二次收到错误结果。
以下是生成虚拟数据的代码:
a <- rnorm(10, 10, 10)
b <- rnorm(10, 10, 10)
x <- data.frame(y=c(rep("a", times=5), rep("b", times=5)), date=c(seq(as.Date("2013-01-01"), as.Date("2013-01-05"), by="days")), value1=a, value2=b)
答案 0 :(得分:2)
你的第二行有效并给出“预期”的结果。第一个失败是因为length(date)
的结果是长度为2的向量而不是单个值。由于您希望data.frame
的每一行都有结果,因此您应该使用transform
而不是summarise
:
ddply(x, .(date), transform, freq=length(date), calc=(value1 * value2) / sum(value1))
y date value1 value2 freq calc
1 a 2013-01-01 8.0886946 -4.498656 2 -2.376917
2 b 2013-01-01 7.2203152 1.222322 2 0.576494
3 a 2013-01-02 7.9971361 -5.675020 2 -1.757606
4 b 2013-01-02 17.8242945 26.489059 2 18.285152
5 a 2013-01-03 3.0401349 10.495623 2 1.283746
6 b 2013-01-03 21.8153403 14.648083 2 12.856439
7 a 2013-01-04 14.4831518 -2.812941 2 -2.685447
8 b 2013-01-04 0.6875999 27.397730 2 1.241776
9 a 2013-01-05 6.2625381 19.979980 2 8.386698
10 b 2013-01-05 8.6569681 11.385124 2 6.606161
答案 1 :(得分:2)
使用data.table
library(data.table)
x<-data.table(x)
x[,list(freq=length(date),cal=(value1*value2)/sum(value1)),keyby="date"]
date freq cal
1: 2013-01-01 1 -3.94483543
2: 2013-01-01 1 10.83779796
3: 2013-01-02 1 2.33439622
4: 2013-01-02 1 10.62941740
5: 2013-01-03 1 2.97776304
6: 2013-01-03 1 0.06035661
7: 2013-01-04 1 1.59372587
8: 2013-01-04 1 7.17029644
9: 2013-01-05 1 -0.64156778
10: 2013-01-05 1 -1.23650898