通过查找每个唯一值的特定变量的特定出现的百分比来汇总数据框

时间:2017-07-26 13:54:15

标签: r dplyr

我想在每个独特的transcript_id

中找到矿石列中的感觉百分比
transcript_id    ore      
A1               sense        
A1               sense        
A1               antisense      
A2               sense      
A2               antisense      
A3               sense     
A4               antisense      
A4               antisense  

预期产出

 transcript_id    fraction  
        A1            0.66
        A2            0.5
        A3            1
        A4            0

1 个答案:

答案 0 :(得分:4)

df %>% group_by(transcript_id) %>% summarise(fraction = sum(ore == "sense")/n())

# A tibble: 4 x 2
#  transcript_id  fraction
#         <fctr>     <dbl>
#1            A1 0.6666667
#2            A2 0.5000000
#3            A3 1.0000000
#4            A4 0.0000000

相当于(如果使用mean评论@docendo并且ore中没有缺失值):

df %>% group_by(transcript_id) %>% summarise(fraction = mean(ore == "sense"))

# A tibble: 4 x 2
#  transcript_id  fraction
#         <fctr>     <dbl>
#1            A1 0.6666667
#2            A2 0.5000000
#3            A3 1.0000000
#4            A4 0.0000000