如何在两个特定值之间过滤行

时间:2018-01-16 06:18:15

标签: r

我需要帮助过滤以下数据框(这是一个简单的例子):

mx = as.data.frame(cbind(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
                         c(F, T, F, F, F, F, T, F,T)) )
colnames(mx) = c("mutation", "distance")
mx
  mutation distance
1        -    FALSE
2        -     TRUE
3        -    FALSE
4        -    FALSE
5 mutation    FALSE
6        +    FALSE
7        +     TRUE
8        +    FALSE
9        +     TRUE

我需要根据第二列(距离)进行过滤,所以它看起来像这样:

  mutation distance
3        -    FALSE
4        -    FALSE
5 mutation    FALSE
6        +    FALSE

我需要删除所有行,直到具有TRUE值的行之前的最后mx$mutation = mutation(所以行1和2),以及第一个TRUE之后的所有行发生在mx$mutation = mutation之后(因此第7行及以后)。

3 个答案:

答案 0 :(得分:1)

我们可以通过执行逻辑列的累积总和('距离')来创建分组变量,然后执行filter

library(dplyr)
mx %>%
  group_by(grp = cumsum(distance)) %>% 
  filter(any(mutation == "mutation") & !distance) %>%
  ungroup %>% 
  select(-grp)
# A tibble: 4 x 2
# mutation distance
#  <fctr>   <lgl>   
#1 -        F       
#2 -        F       
#3 mutation F       
#4 +        F       

注意:我们可以使用data.frame直接创建data.frame。不需要cbind,它会对列的类型产生负面影响,因为cbind转换为matrixmatrix只能包含一种类型

数据

mx = data.frame(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
                      c(F, T, F, F, F, F, T, F,T)) 

答案 1 :(得分:0)

希望这有帮助!

https://

输出是:

host = 'kbckjsdkcdn.us-east-1.es.amazonaws.com'

答案 2 :(得分:0)

您可以使用which()方法正确识别行:

# get rownum of last TRUE before df$mutation=mutation
last_true_before_mutation <- max(which(mx$distance == 'TRUE')[which(mx$distance == 'TRUE') < which(mx$mutation == 'mutation')])

# get rownum of first TRUE after df$mutation=mutation
first_true_after_mutation <- min(which(mx$distance == 'TRUE')[which(mx$distance == 'TRUE') > which(mx$mutation == 'mutation')])

# all rows to remove 
rem_rows <- c(seq(1:last_true_before_mutation), seq(first_true_after_mutation, nrow(mx)))

# remove approproate rows
mx[-rem_rows, ]

enter image description here

以下是您可以使用的通用功能:

before_after_mutation <- function(df) {
    last_true_before_mutation <- max(which(df$distance == 'TRUE')[which(df$distance == 'TRUE') < which(df$mutation == 'mutation')])
    first_true_after_mutation <- min(which(df$distance == 'TRUE')[which(df$distance == 'TRUE') > which(df$mutation == 'mutation')])
    rem_rows <- c(seq(1:last_true_before_mutation), seq(first_true_after_mutation, nrow(df)))
    res <- df[-rem_rows,]
    return(res)
}

<强>用法:

before_after_mutation(mx)

enter image description here