R融化了数据帧排名

时间:2018-02-09 18:17:12

标签: r dataframe bioinformatics

我的数据框看起来像这样:

> head(female.meth.ordered)
        Var1                                     Var2      value RankMeth
1 cg25296477 ES__WA09_passage39_Female____87.1429.1.1 0.85581970        1
2 cg01003813 ES__WA09_passage39_Female____87.1429.1.1 0.91677790        1
3 cg13176022 ES__WA09_passage39_Female____87.1429.1.1 0.04714496        1
4 cg26484667 ES__WA09_passage39_Female____87.1429.1.1 0.85785770        1
5 cg21028156 ES__WA09_passage39_Female____87.1429.1.1 0.04065772        1
6 cg11503671 ES__WA09_passage39_Female____87.1429.1.1 0.82933710        1

此数据框有606528行。 Row Var2包含54个唯一的样本名称。

> unique(female.meth.ordered$Var2)

[1] ES__WA09_passage39_Female____87.1429.1.1                   
 [2] ES__WA09_passage39_Female____87.1429.2.1                   
 [3] ES__MEL4_passage35_Female____127.378.3.1                   
 [4] ES__CSC14_passage29_Female____197.1296.1.2                 
 [5] ES__CM6_passage19_Female____244.622.1.1                    
 [6] ES__HES2_passage105_Female____32.135.4.1  
54 Levels: ES.parthenote__LLC15_passage45_Female____317.905.1.1 ...

我想为“VarM”列中的前10个唯一匹配分配“RankMeth”列的等级1。然后为“Var2”列中的下一个10个唯一命中分配“RankMeth”列的等级2。等等3,4,5。

1 个答案:

答案 0 :(得分:2)

最简单的解决方案可能是:

方法:

Var2除以row_number,并将10除以Var2。这将为1-10提供meth_rank组中的排名。说它是meth_rank

使用'female.meth.ordered to find out corresponding MethRank`加入meth_rank <- unique(female.meth.ordered$Var2) %>% as.data.frame() %>% mutate(RankMeth = ceiling(row_number()/10)) colnames(meth_rank) <- c("Var2", "RankMeth") #Join meth_rank with female.meth.ordered to populate rank. female.meth.ordered %>% select(-RankMeth) %>% inner_join(meth_rank, by="Var2") #Result will be generated with headings as # Var1 Var2 value RankMeth 行。

> library(tidyverse)
> library(yardstick)
> 
> sample_df <- data_frame(
+     group_type = rep(c('a', 'b', 'c'), each = 5),  # repeats each element 5 times
+     true_label = as.factor(rbinom(15, 1, 0.3)),    # generates 1 with 30% prob
+     pred_prob = runif(15, 0, 1)                    # generates 15 decimals between 0 and 1 from uniform dist
+ ) %>%
+     mutate(pred_label = as.factor(if_else(pred_prob > 0.5, 1, 0)))
> 
> sample_df
# A tibble: 15 x 4
   group_type true_label pred_prob pred_label
   <chr>      <fct>          <dbl> <fct>     
 1 a          1             0.327  0         
 2 a          1             0.286  0         
 3 a          0             0.0662 0         
 4 a          0             0.993  1         
 5 a          0             0.835  1         
 6 b          0             0.975  1         
 7 b          0             0.436  0         
 8 b          0             0.585  1         
 9 b          0             0.478  0         
10 b          1             0.541  1         
11 c          1             0.247  0         
12 c          0             0.608  1         
13 c          0             0.215  0         
14 c          0             0.937  1         
15 c          0             0.819  1         
>