我想根据每个组的两个不同列中指定的条件删除该行。 在我的情况下,我想删除第一次入院时出现的“死亡”,但对于每个患者的病历,在再次入院时保留“死亡”
这是初始data.frame:
ConditionI <- c("2017-01-01", "2018-01-01", "2018-01-15", "2018-01-20", "2018-02-01", "2018-02-1", "2018-03-01", "2018-04-01","2018-04-10")
ConditionII <- c("Death", "Alive", "Alive", "Death", "Alive", "Alive", "Death", "Alive", "Death")
id <- c("A","B","B","B","C","C","D","E","E")
df <- data.frame(id,ConditionI,ConditionII
我的目标是:
ConditionII <- c( "Alive", "Alive", "Death", "Alive", "Alive", "Alive", "Death")
ConditionI <- c( "2018-01-01", "2018-01-15", "2018-01-20", "2018-02-01", "2018-02-1", "2018-04-01","2018-04-10")
id <- c("B","B","B","C","C","E","E")
df <- data.frame(id,ConditionI,ConditionII)
我认为这是一个非常基本的问题,但是我尝试了几次却没有得到答案。非常感激你的帮助。 预先感谢!
答案 0 :(得分:1)
我们可以直接将subset
中的duplicated
与base R
一起使用
subset(df, !id %in% id[!duplicated(id) & ConditionII == 'Death'])
# id ConditionI ConditionII
#2 B 2018-01-01 Alive
#3 B 2018-01-15 Alive
#4 B 2018-01-20 Death
#5 C 2018-02-01 Alive
#6 C 2018-02-1 Alive
#8 E 2018-04-01 Alive
#9 E 2018-04-10 Death
或与dplyr
library(dplyr)
df %>%
filter( !id %in% id[!duplicated(id) & ConditionII == 'Death'])
答案 1 :(得分:0)
您可以删除每个组中第1行的'Death'
处的行。
library(dplyr)
df %>%
group_by(id) %>%
filter(!(row_number() == 1 & ConditionII == 'Death'))
# id ConditionI ConditionII
# <chr> <chr> <chr>
#1 B 2018-01-01 Alive
#2 B 2018-01-15 Alive
#3 B 2018-01-20 Death
#4 C 2018-02-01 Alive
#5 C 2018-02-1 Alive
#6 E 2018-04-01 Alive
#7 E 2018-04-10 Death
使用data.table
的相同逻辑:
library(data.table)
setDT(df)[, .SD[!(seq_len(.N) == 1 & ConditionII == 'Death')], id]