我之前发布了question关于按组汇总值的信息,不包括NA或文字以下的值。这个问题类似,但我不想按小组求和,而是删除特定文本(在这种情况下为end
)下的所有值,这些值按ID(在这种情况下为name
)分组。例如,
我想离开这里:
#Starting df
name = c("tom", "tom", "tom", "chris", "chris", "chris","chris", "jen", "jen", "jen","jen","jen")
value = c(2,10,"end",45,"end",13,20,6,"end",13,3,5)
start_df = data.frame(name,value)
到这里:
#Ending df
name = c("tom", "tom", "tom", "chris", "chris","jen", "jen")
value = c(2,10,"end",45,"end",6,"end")
end_df = data.frame(name,value)
除了end
之外还有其他文字,所以我希望找到一个允许我指定文本的解决方案。有关如何做到这一点的任何想法?谢谢R社区。 p>
答案 0 :(得分:4)
另一个基于dplyr
的解决方案:
start_df %>%
group_by(name) %>%
mutate(rownum = row_number(), keeprows = (value=='end') * rownum) %>%
filter(rownum <= max(keeprows)) %>%
select(-keeprows)
答案 1 :(得分:3)
使用dplyr
和tidyr
的解决方案。 start_df2
是最终输出。
library(dplyr)
library(tidyr)
start_df2 <- start_df %>%
mutate(ID = 1:n()) %>%
group_by(name) %>%
mutate(flag = ifelse(value %in% "end", 1, NA)) %>%
fill(flag, .direction = "up") %>%
filter(!is.na(flag)) %>%
ungroup() %>%
arrange(ID) %>%
select(-ID, -flag)
start_df2
# # A tibble: 7 x 2
# name value
# <fctr> <fctr>
# 1 tom 2
# 2 tom 10
# 3 tom end
# 4 chris 45
# 5 chris end
# 6 jen 6
# 7 jen end
答案 2 :(得分:3)
您可以使用slice
来获取它,即
library(dplyr)
start_df %>%
group_by(name) %>%
slice(1L:which(value == 'end'))
给出,
# A tibble: 7 x 2 # Groups: name [3] name value <fctr> <fctr> 1 chris 45 2 chris end 3 jen 6 4 jen end 5 tom 2 6 tom 10 7 tom end