如何根据条件重新排列R中数据帧行中的元素?

时间:2015-09-16 01:56:59

标签: r dataframe

我有一个数据框D,大约有100行,每个都设置为代表不同的彩票,如下所示:

     pA0   pA1   pA2 A0 A1 A2 
1  0.625 0.000 0.375  1 20 41
2  0.375 0.625 0.000  1 20 41
3  0.000 1.000 0.000  1 20 41
4  0.125 0.750 0.125  1 20 41
5  0.500 0.375 0.125  1 20 41
6  0.250 0.750 0.000  1 20 41
7  0.250 0.625 0.125  1 20 41
8  0.250 0.250 0.500  1 20 41
9  0.125 0.375 0.500  1 20 41
10 0.125 0.250 0.625  1 20 41
...

其中^ p变量表示从抽奖中出现具有相同后缀的结果的概率。因此,对于抽奖1,抽奖A将导致结果为1(pA0),抽奖A的概率为0%(A0)的概率为62.5%(pA1)将导致结果为20(A1),以及37.5%(pA2)的机会,即彩票A将导致结果为41(A2)。同样适用于所有其他彩票。

我想要做的是创建一个新的数据框,比如E,它会从D获取彩票,但后缀2代表最高结果且具有正概率, 1代表具有正概率的第二高结果,0代表具有正概率的最低结果。例如,第1行现在是:

     pA0   pA1    pA2 A0 A1 A2
1  0.000 0.625  0.375 20  1 41

如果抽奖有一个概率为0的结果,那么它需要排在最后(pA0A0),如果它有多个结果且概率为0,那么它就没有#39 ;只要具有正概率的结果具有2的等级,只要一个人在另一个上排名就很重要。

我非常确定我可以使用大量嵌套ififelse语句来完成此操作,但我真的很想找到一个并不需要这个的解决方案。奖励积分可以推广到每次抽奖的n结果。

2 个答案:

答案 0 :(得分:1)

我们使用grep创建以'p'开头的列名索引。按行循环,我们将p列与非p列相乘,得到order,用它来排列每行中的值。

E <- D
i1 <- grepl('^p', names(D))
E[] <- t(apply(D, 1, function(x) {i2 <- order(x[i1]*x[!i1])
                                  c(x[i1][i2], x[!i1][i2])}))
head(E,2)
#  pA0   pA1   pA2 A0 A1 A2
#1   0 0.625 0.375 20  1 41
#2   0 0.375 0.625 41  1 20

数据

D <- structure(list(pA0 = c(0.625, 0.375, 0, 0.125, 0.5, 0.25, 0.25, 
0.25, 0.125, 0.125), pA1 = c(0, 0.625, 1, 0.75, 0.375, 0.75, 
0.625, 0.25, 0.375, 0.25), pA2 = c(0.375, 0, 0, 0.125, 0.125, 
0, 0.125, 0.5, 0.5, 0.625), A0 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), A1 = c(20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L), A2 = c(41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L
)), .Names = c("pA0", "pA1", "pA2", "A0", "A1", "A2"), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

答案 1 :(得分:0)

利用@ akrun的想法来使用apply函数,但是按值排序的非零概率结果,而非预期值。

E <-  D

# The number of columns divided by 2 is the number of outcomes
n <- ncol(tmp) / 2

E[] <- t(apply(E, 1, function(x) {

            # x is the row , first n elements are probs, the second n
            # elements are the corresponding outcomes

            uo <- c()   # vector for unordered outcomes
            up <- c()   # vector for unordered probabilities
            oo <- c()   # vector for ordered outcomes
            op <- c()   # vector for ordered probabilities

            for (i in 1:n){             # Loop through probabilities
                if( x[i] != 0){         # if probability isn't 0, it needs to be ordered
                    op <- c(op, x[i])   # add the probability to the vector
                    oo <- c(oo, x[i+n]) # add the outcome to the vector
                }
                else{                   # if the probability is 0, it isn't ordered
                    up <- c(up, x[i] )  
                    uo <- c(uo, x[i+n] )
                }
            }

            r <- order(oo)  # Order the elements of the outcomes vector that need to be ordered

            p <- c(up, op[r]) # vector of probabilites with the 0's at the back
            o <- c(uo, oo[r]) # vector of outcomes with 0 probability outcomes in the back

            c(p,o)

        }))

数据:

head(D,10)
     pA0   pA1   pA2 A0 A1 A2
1  0.625 0.000 0.375  1 20 41
2  0.375 0.625 0.000  1 20 41
3  0.000 1.000 0.000  1 20 41
4  0.125 0.750 0.125  1 20 41
5  0.500 0.375 0.125  1 20 41
6  0.250 0.750 0.000  1 20 41
7  0.250 0.625 0.125  1 20 41
8  0.250 0.250 0.500  1 20 41
9  0.125 0.375 0.500  1 20 41
10 0.125 0.250 0.625  1 20 41

head(E,10)
     pA0   pA1   pA2 A0 A1 A2
1  0.000 0.625 0.375 20  1 41
2  0.000 0.375 0.625 41  1 20
3  0.000 0.000 1.000  1 41 20
4  0.125 0.750 0.125  1 20 41
5  0.500 0.375 0.125  1 20 41
6  0.000 0.250 0.750 41  1 20
7  0.250 0.625 0.125  1 20 41
8  0.250 0.250 0.500  1 20 41
9  0.125 0.375 0.500  1 20 41
10 0.125 0.250 0.625  1 20 41