我有一个数据框D
,大约有100行,每个都设置为代表不同的彩票,如下所示:
pA0 pA1 pA2 A0 A1 A2
1 0.625 0.000 0.375 1 20 41
2 0.375 0.625 0.000 1 20 41
3 0.000 1.000 0.000 1 20 41
4 0.125 0.750 0.125 1 20 41
5 0.500 0.375 0.125 1 20 41
6 0.250 0.750 0.000 1 20 41
7 0.250 0.625 0.125 1 20 41
8 0.250 0.250 0.500 1 20 41
9 0.125 0.375 0.500 1 20 41
10 0.125 0.250 0.625 1 20 41
...
其中^ p变量表示从抽奖中出现具有相同后缀的结果的概率。因此,对于抽奖1,抽奖A将导致结果为1(pA0
),抽奖A的概率为0%(A0
)的概率为62.5%(pA1
)将导致结果为20(A1
),以及37.5%(pA2
)的机会,即彩票A将导致结果为41(A2
)。同样适用于所有其他彩票。
我想要做的是创建一个新的数据框,比如E
,它会从D
获取彩票,但后缀2
代表最高结果且具有正概率, 1
代表具有正概率的第二高结果,0
代表具有正概率的最低结果。例如,第1行现在是:
pA0 pA1 pA2 A0 A1 A2
1 0.000 0.625 0.375 20 1 41
如果抽奖有一个概率为0的结果,那么它需要排在最后(pA0
,A0
),如果它有多个结果且概率为0,那么它就没有#39 ;只要具有正概率的结果具有2
的等级,只要一个人在另一个上排名就很重要。
我非常确定我可以使用大量嵌套if
或ifelse
语句来完成此操作,但我真的很想找到一个并不需要这个的解决方案。奖励积分可以推广到每次抽奖的n
结果。
答案 0 :(得分:1)
我们使用grep
创建以'p'开头的列名索引。按行循环,我们将p列与非p列相乘,得到order
,用它来排列每行中的值。
E <- D
i1 <- grepl('^p', names(D))
E[] <- t(apply(D, 1, function(x) {i2 <- order(x[i1]*x[!i1])
c(x[i1][i2], x[!i1][i2])}))
head(E,2)
# pA0 pA1 pA2 A0 A1 A2
#1 0 0.625 0.375 20 1 41
#2 0 0.375 0.625 41 1 20
D <- structure(list(pA0 = c(0.625, 0.375, 0, 0.125, 0.5, 0.25, 0.25,
0.25, 0.125, 0.125), pA1 = c(0, 0.625, 1, 0.75, 0.375, 0.75,
0.625, 0.25, 0.375, 0.25), pA2 = c(0.375, 0, 0, 0.125, 0.125,
0, 0.125, 0.5, 0.5, 0.625), A0 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), A1 = c(20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L), A2 = c(41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L
)), .Names = c("pA0", "pA1", "pA2", "A0", "A1", "A2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
答案 1 :(得分:0)
利用@ akrun的想法来使用apply函数,但是按值排序的非零概率结果,而非预期值。
E <- D
# The number of columns divided by 2 is the number of outcomes
n <- ncol(tmp) / 2
E[] <- t(apply(E, 1, function(x) {
# x is the row , first n elements are probs, the second n
# elements are the corresponding outcomes
uo <- c() # vector for unordered outcomes
up <- c() # vector for unordered probabilities
oo <- c() # vector for ordered outcomes
op <- c() # vector for ordered probabilities
for (i in 1:n){ # Loop through probabilities
if( x[i] != 0){ # if probability isn't 0, it needs to be ordered
op <- c(op, x[i]) # add the probability to the vector
oo <- c(oo, x[i+n]) # add the outcome to the vector
}
else{ # if the probability is 0, it isn't ordered
up <- c(up, x[i] )
uo <- c(uo, x[i+n] )
}
}
r <- order(oo) # Order the elements of the outcomes vector that need to be ordered
p <- c(up, op[r]) # vector of probabilites with the 0's at the back
o <- c(uo, oo[r]) # vector of outcomes with 0 probability outcomes in the back
c(p,o)
}))
数据:
head(D,10)
pA0 pA1 pA2 A0 A1 A2
1 0.625 0.000 0.375 1 20 41
2 0.375 0.625 0.000 1 20 41
3 0.000 1.000 0.000 1 20 41
4 0.125 0.750 0.125 1 20 41
5 0.500 0.375 0.125 1 20 41
6 0.250 0.750 0.000 1 20 41
7 0.250 0.625 0.125 1 20 41
8 0.250 0.250 0.500 1 20 41
9 0.125 0.375 0.500 1 20 41
10 0.125 0.250 0.625 1 20 41
head(E,10)
pA0 pA1 pA2 A0 A1 A2
1 0.000 0.625 0.375 20 1 41
2 0.000 0.375 0.625 41 1 20
3 0.000 0.000 1.000 1 41 20
4 0.125 0.750 0.125 1 20 41
5 0.500 0.375 0.125 1 20 41
6 0.000 0.250 0.750 41 1 20
7 0.250 0.625 0.125 1 20 41
8 0.250 0.250 0.500 1 20 41
9 0.125 0.375 0.500 1 20 41
10 0.125 0.250 0.625 1 20 41