我需要帮助过滤以下数据框(这是一个简单的例子):
mx = as.data.frame(cbind(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
c(F, T, F, F, F, F, T, F,T)) )
colnames(mx) = c("mutation", "distance")
mx
mutation distance
1 - FALSE
2 - TRUE
3 - FALSE
4 - FALSE
5 mutation FALSE
6 + FALSE
7 + TRUE
8 + FALSE
9 + TRUE
我需要根据第二列(距离)进行过滤,所以它看起来像这样:
mutation distance
3 - FALSE
4 - FALSE
5 mutation FALSE
6 + FALSE
我需要删除所有行,直到具有TRUE
值的行之前的最后mx$mutation = mutation
(所以行1和2),以及第一个TRUE
之后的所有行发生在mx$mutation = mutation
之后(因此第7行及以后)。
答案 0 :(得分:1)
我们可以通过执行逻辑列的累积总和('距离')来创建分组变量,然后执行filter
library(dplyr)
mx %>%
group_by(grp = cumsum(distance)) %>%
filter(any(mutation == "mutation") & !distance) %>%
ungroup %>%
select(-grp)
# A tibble: 4 x 2
# mutation distance
# <fctr> <lgl>
#1 - F
#2 - F
#3 mutation F
#4 + F
注意:我们可以使用data.frame
直接创建data.frame
。不需要cbind
,它会对列的类型产生负面影响,因为cbind
转换为matrix
而matrix
只能包含一种类型
mx = data.frame(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
c(F, T, F, F, F, F, T, F,T))
答案 1 :(得分:0)
希望这有帮助!
https://
输出是:
host = 'kbckjsdkcdn.us-east-1.es.amazonaws.com'
答案 2 :(得分:0)
您可以使用which()方法正确识别行:
# get rownum of last TRUE before df$mutation=mutation
last_true_before_mutation <- max(which(mx$distance == 'TRUE')[which(mx$distance == 'TRUE') < which(mx$mutation == 'mutation')])
# get rownum of first TRUE after df$mutation=mutation
first_true_after_mutation <- min(which(mx$distance == 'TRUE')[which(mx$distance == 'TRUE') > which(mx$mutation == 'mutation')])
# all rows to remove
rem_rows <- c(seq(1:last_true_before_mutation), seq(first_true_after_mutation, nrow(mx)))
# remove approproate rows
mx[-rem_rows, ]
以下是您可以使用的通用功能:
before_after_mutation <- function(df) {
last_true_before_mutation <- max(which(df$distance == 'TRUE')[which(df$distance == 'TRUE') < which(df$mutation == 'mutation')])
first_true_after_mutation <- min(which(df$distance == 'TRUE')[which(df$distance == 'TRUE') > which(df$mutation == 'mutation')])
rem_rows <- c(seq(1:last_true_before_mutation), seq(first_true_after_mutation, nrow(df)))
res <- df[-rem_rows,]
return(res)
}
<强>用法:强>
before_after_mutation(mx)