如果符合标准R,则每个因子仅保留一行

时间:2014-12-18 15:18:10

标签: r

我有一个如下所示的数据集:

ID  week  action
1   1     TRUE
1   1     FALSE
1   2     FALSE 
1   2     FALSE
1   3     FALSE
1   3     TRUE
2   1     FALSE
2   2     TRUE
2   2     FALSE
...

我想做的是保留每个ID以及ID中的每周,一个操作值,如果有,则优先保留TRUE,否则为FALSE。

所以通过时会看起来像这样:

ID  week  action
1   1     TRUE
1   2     FALSE
1   3     TRUE
2   1     FALSE
2   2     TRUE
...

4 个答案:

答案 0 :(得分:2)

尝试

library(dplyr)
library(tidyr)
df %>% 
   group_by(ID, week)%>% 
   arrange(desc(action)) %>%
   slice(1)
#   ID week action
#1  1    1   TRUE
#2  1    2  FALSE
#3  1    3   TRUE
#4  2    1  FALSE
#5  2    2   TRUE

或使用data.table

 library(data.table)
 setDT(df)[order(action,decreasing=TRUE),
           .SD[1] , by=list(ID, week)][order(ID,week)]
 #   ID week action
 #1:  1    1   TRUE
 #2:  1    2  FALSE
 #3:  1    3   TRUE
 #4:  2    1  FALSE
 #5:  2    2   TRUE

或者使用base R类似于@Sam Dickson使用的方法

 aggregate(action~., df, FUN=function(x) sum(x)>0)
 # ID week action
 #1  1    1   TRUE
 #2  2    1  FALSE
 #3  1    2  FALSE
 #4  2    2   TRUE
 #5  1    3   TRUE

或者受到@docendo discimus的启发,data.table选项将是

  setDT(df)[, .SD[which.max(action)], by=list(ID, week)]

数据

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), week = c(1L, 
1L, 2L, 2L, 3L, 3L, 1L, 2L, 2L), action = c(TRUE, FALSE, FALSE, 
 FALSE, FALSE, TRUE, FALSE, TRUE, FALSE)), .Names = c("ID", "week", 
 "action"), class = "data.frame", row.names = c(NA, -9L))

答案 1 :(得分:2)

我用过plyr:

library(plyr)
ddply(df,.(ID,week),summarize,action=sum(action)>0)

答案 2 :(得分:2)

两个选项类似于akrun的asnwer,但不一样,这就是我单独发布的原因:

aggregate(action ~ ID + week, df, max)
#  ID week action
#1  1    1      1   # you can use 1/0s the same way as TRUE/FALSE
#2  2    1      0
#3  1    2      0
#4  2    2      1
#5  1    3      1

library(dplyr)
group_by(df, ID, week) %>% slice(which.max(action))
#Source: local data frame [5 x 3]
#Groups: ID, week
#
#  ID week action
#1  1    1   TRUE
#2  1    2  FALSE
#3  1    3   TRUE
#4  2    1  FALSE
#5  2    2   TRUE

which.max的帮助页面告诉您它找到了数字或逻辑向量的第一个最大值,因此即使您有多个TRUE条目(与1和FALSE为0),您只需选择第一次出现并返回即可。您可以使用which.min

执行相反的操作

答案 3 :(得分:2)

包含aggregateany的基本R解决方案:

aggregate(action ~ week + ID, df, any)
#   week ID action
# 1    1  1   TRUE
# 2    2  1  FALSE
# 3    3  1   TRUE
# 4    1  2  FALSE
# 5    2  2   TRUE

另一个基础R解决方案:

subset(transform(df, action = ave(action, week, ID, FUN = any)), !duplicated(df[-3]))
#   ID week action
# 1  1    1   TRUE
# 3  1    2  FALSE
# 5  1    3   TRUE
# 7  2    1  FALSE
# 8  2    2   TRUE