希望通过cutoff df
将df.listing
框架拆分为嵌套index_cutoff
列表:
数据:
df <- data.frame(m=c("A","T","W","Z","B","A","A","W","T","K","G","B","T","B"))
index_cutoff <- c("A","B")
尝试代码:
df.listing <- split(df, df$m %in% keyword_cutoff) #failed, not working
当前输出:
$`FALSE`
m
2 T
3 W
4 Z
8 W
9 T
10 K
11 G
13 T
$`TRUE`
m
1 A
5 B
6 A
7 A
12 B
14 B
所需的输出第1阶段:
df.listing[[1]]
A
T
W
Z
df.listing[[2]]
B
df.listing[[3]]
A
df.listing[[4]]
A
W
T
K
G
df.listing[[5]]
B
T
df.listing[[6]]
B
期望的输出结果:
df.listing[[1]]
A
T
W
Z
df.listing[[2]]
B
df.listing[[3]]
A #since at stage 1 they are the same cutoff, hence self merge into next list
A
W
T
K
G
df.listing[[4]]
B #since at stage 1 they begin the same with "B" cutoff
T
B
感谢您并且不能通过R数据集提供可重现的示例。
答案 0 :(得分:5)
我们需要将逻辑索引的累积和作为拆分组
split(df, cumsum(df$m %in% index_cutoff))
在OP的代码中,df$m %in% index_cutoff
只有两个组,即TRUE和FALSE。通过执行cumsum
,可以通过在每个TRUE值
答案 1 :(得分:2)
您可以尝试类似
的内容library(dplyr)
library(zoo)
df1 <- df %>%
mutate_if(is.factor, as.character) %>%
mutate(grp = ifelse(m %in% index_cutoff, row_number(), NA))
df2 <- df1 %>%
filter(!is.na(grp)) %>%
mutate(new_grp = na.locf(ifelse(m != lag(m, default='0'), grp, NA))) %>%
right_join(df1, by = c("m", "grp")) %>%
select(-grp) %>%
mutate(new_grp = na.locf(new_grp))
将最终所需的分组作为
df2
# m new_grp
#1 A 1
#2 T 1
#3 W 1
#4 Z 1
#5 B 5
#6 A 6
#7 A 6
#8 W 6
#9 T 6
#10 K 6
#11 G 6
#12 B 12
#13 T 12
#14 B 12
现在运行
split(df2$m, df2$new_grp)
你会得到
$`1`
[1] "A" "T" "W" "Z"
$`5`
[1] "B"
$`6`
[1] "A" "A" "W" "T" "K" "G"
$`12`
[1] "B" "T" "B"