Question

我有一个看起来像这样的数据

GeneID Score
ABC     0.1
EFH     0.2
ABC     0.5
STY     0.1
TRQ     0.2
TRQ     0.1
EFH     0.5
EFH     0.1
EFH     0.01

我希望在第2列的bin范围内获得第1列的频率，如下所示：

<=0.1             4
>0.1 and <=0.5    4

即使第1列中存在冗余值，如果第1列中的特定值在同一范围内出现两次，我该如何仅计算一次？

Answer 1

假设您的数据框被称为df，这就是我要做的事情：

library(dplyr)

df <- df %>%
  mutate(bin = ifelse(Score <= 0.1, "(,0.1]", ifelse(Score <= 0.5, "(0.1,0.5]", "(0.5,]"))) %>%
  group_by(bin) %>%
  summarise(N = n())

返回

Source: local data frame [2 x 2]

        bin N
1    (,0.1] 5
2 (0.1,0.5] 4

Answer 2

此处您不需要任何ifelse语句，只需使用cut

即可

table(droplevels(cut(df$Score, c(-Inf, .1, .5, Inf))))
# (-Inf,0.1]  (0.1,0.5] 
#          5          4

虽然如果Score与提供的数据集一样受限，但您只需按条件使用table

setNames(table(df$Score > 0.1), c(" <= 0.1", "> 0.1"))
# <= 0.1   > 0.1 
#      5       4

Answer 3

应该使用包plyr

ddply(data, .(GeneID), summarize, frequency = (length(GeneID)/nrow(data)),
                                        range = max(Score)-min(Score))

R中另一列范围内的列的频率

3 个答案: