假设我有一个data.table,如下所示(您可以将w视为分组变量):
set.seed(1)
prQ = CJ(Q1 = 1:10, Q2=1:10,w=1:2)
prQ[,pQ:=runif(100,0,1)]
prQ[,pQ:=pQ/sum(pQ),by=w]
> prQ
Q1 Q2 w pQ
1: 1 1 1 0.004889560
2: 1 1 2 0.007553012
3: 1 2 1 0.010549565
4: 1 2 2 0.018433927
5: 1 3 1 0.003714138
---
196: 10 8 2 0.016183006
197: 10 9 1 0.008384253
198: 10 9 2 0.008323492
199: 10 10 1 0.014932841
200: 10 10 2 0.012278353
对于给定的w,如何计算称为CDF的新列,该列执行以下操作:
例如,假设Q1
= 4和Q2
=6。定义一个新列
CDF
=所有Q1
<= 4和Q2
<= 6的sum(pQ),保持w固定。
例如,一行:
CDF0 = sum(prQ[Q1<=4 & Q2<=6 & w==1,pQ])
prQ[Q1==4 & Q2==6,CDF:=CDF0]
我想对给定w的所有行执行此操作。
使用蛮力完成所需的输出:
for(w0 in 1:2){
for(j in 1:10){
for(p in 1:10){
CDF0 = sum(prQ[Q1<=j & Q2<=p & w==w0,pQ])
prQ[Q1==j & Q2==p & w==w0,CDF:=CDF0]
}
}
}
> head(prQ)
Q1 Q2 w pQ CDF
1: 1 1 1 0.004889560 0.004889560
2: 1 1 2 0.007553012 0.007553012
3: 1 2 1 0.010549565 0.015439125
4: 1 2 2 0.018433927 0.025986939
5: 1 3 1 0.003714138 0.019153263
6: 1 3 2 0.018234648 0.044221587
答案 0 :(得分:1)
从pQ
值构造的矩阵中每个可能的子矩阵(行数=唯一Q2数量,列数=唯一Q1数量)求和的一种可行方法:
#ensure that order is correct as values will be used to generate the matrix
#so that all elements in the top left sub-matrix will always be
#smaller than or equal to the bottom right element of this sub-matrix
setorder(prQ, w, Q1, Q2)
#create all possible permutations of row and column indices
subMatIdx <- prQ[, CJ(as.integer(as.factor(Q1)), as.integer(as.factor(Q2)), unique=TRUE)]
#sum every sub matrix
prQ[, CDF :=
{
nr <- uniqueN(Q2)
.(Map(function(i, j) sum(matrix(pQ, nrow=nr)[1L:j, 1L:i]),
subMatIdx[["V1"]], subMatIdx[["V2"]]))
},
by=.(w)]
输出:
Q1 Q2 w pQ CDF
1: 1 1 1 0.004889560 0.00488956
2: 1 2 1 0.010549565 0.01543912
3: 1 3 1 0.003714138 0.01915326
4: 1 4 1 0.017396970 0.03655023
5: 1 5 1 0.011585652 0.04813589
---
196: 10 6 2 0.001196193 0.5713282
197: 10 7 2 0.017785668 0.6535378
198: 10 8 2 0.016183006 0.7734989
199: 10 9 2 0.008323492 0.871678
200: 10 10 2 0.012278353 1
编辑:
Q1和Q2为负或任何实数怎么办?
subMatIdx
上的行应该已经处理好了。
例如:
set.seed(1)
prQ = CJ(Q1 = -1:10, Q2=-1:10,w=1:2)
prQ[,pQ:=runif(nrow(prQ),0,1)]
prQ[,pQ:=pQ/sum(pQ),by=w]
setorder(prQ, w, Q1, Q2)
#create all possible permutations of row and column indices
subMatIdx <- prQ[, CJ(as.integer(as.factor(Q1)),
as.integer(as.factor(Q2)), unique=TRUE)]
prQ[, CDF := {
nr <- uniqueN(Q2)
.(Map(function(i, j) sum(matrix(pQ, nrow=nr)[1L:j, 1L:i]),
subMatIdx[["V1"]], subMatIdx[["V2"]]))
},
by=.(w)]
输出:
Q1 Q2 w pQ CDF
1: -1 -1 1 0.003607862 0.003607862
2: -1 0 1 0.007784212 0.01139207
3: -1 1 1 0.002740553 0.01413263
4: -1 2 1 0.012836710 0.02696934
5: -1 3 1 0.008548709 0.03551805
---
284: 10 6 2 0.011164332 0.6425251
285: 10 7 2 0.007638237 0.7360602
286: 10 8 2 0.005403923 0.8270053
287: 10 9 2 0.002008067 0.9193811
288: 10 10 2 0.002242777 1