根据组和指定文本有条件地删除行

时间:2017-11-12 04:53:38

标签: r dataframe

我之前发布了question关于按组汇总值的信息,不包括NA或文字以下的值。这个问题类似,但我不想按小组求和,而是删除特定文本(在这种情况下为end)下的所有值,这些值按ID(在这种情况下为name)分组。例如,

我想离开这里:

#Starting df
name = c("tom", "tom", "tom", "chris", "chris", "chris","chris", "jen", "jen", "jen","jen","jen") 
value = c(2,10,"end",45,"end",13,20,6,"end",13,3,5) 
start_df = data.frame(name,value) 

到这里:

#Ending df
name = c("tom", "tom", "tom", "chris", "chris","jen", "jen") 
value = c(2,10,"end",45,"end",6,"end") 
end_df = data.frame(name,value) 

除了end之外还有其他文字,所以我希望找到一个允许我指定文本的解决方案。有关如何做到这一点的任何想法?谢谢R社区。

3 个答案:

答案 0 :(得分:4)

另一个基于dplyr的解决方案:

start_df %>%
    group_by(name) %>%
    mutate(rownum = row_number(), keeprows = (value=='end') * rownum) %>%
    filter(rownum <= max(keeprows)) %>%
    select(-keeprows)

答案 1 :(得分:3)

使用dplyrtidyr的解决方案。 start_df2是最终输出。

library(dplyr)
library(tidyr)

start_df2 <- start_df %>%
  mutate(ID = 1:n()) %>%
  group_by(name) %>%
  mutate(flag = ifelse(value %in% "end", 1, NA)) %>%
  fill(flag, .direction = "up") %>%
  filter(!is.na(flag)) %>%
  ungroup() %>%
  arrange(ID) %>%
  select(-ID, -flag)

start_df2

# # A tibble: 7 x 2
#     name  value
#    <fctr> <fctr>
# 1    tom      2
# 2    tom     10
# 3    tom    end
# 4  chris     45
# 5  chris    end
# 6    jen      6
# 7    jen    end

答案 2 :(得分:3)

您可以使用slice来获取它,即

library(dplyr)

start_df %>% 
 group_by(name) %>% 
 slice(1L:which(value == 'end'))

给出,

# A tibble: 7 x 2
# Groups:   name [3]
    name  value
  <fctr> <fctr>
1  chris     45
2  chris    end
3    jen      6
4    jen    end
5    tom      2
6    tom     10
7    tom    end