我的数据框看起来像这样:
ID = c(1,1,1,1,2,2,3,3,3,3,4,4)
TIME = as.POSIXct(c("2013-03-31 09:07:00", "2013-09-26 10:07:00", "2013-03-31 11:07:00",
"2013-09-26 12:07:00","2013-03-31 09:10:00","2013-03-31 11:11:00",
"2013-03-31 09:06:00","2013-09-26 09:04:00","2013-03-31 10:35:00",
"2013-09-26 09:07:00","2013-09-26 09:07:00","2013-09-26 10:07:00"))
var = c(0,0,1,1,0,1,0,0,1,1,0,1)
DF = data.frame(ID, TIME, var)
ID TIME var
1 1 2013-03-31 09:07:00 0
2 1 2013-09-26 10:07:00 0
3 1 2013-03-31 11:07:00 1
4 1 2013-09-26 12:07:00 1
5 2 2013-03-31 09:10:00 0
6 2 2013-03-31 11:11:00 1
7 3 2013-03-31 09:06:00 0
8 3 2013-09-26 09:04:00 0
9 3 2013-03-31 10:35:00 1
10 3 2013-09-26 09:07:00 1
11 4 2013-09-26 09:07:00 0
12 4 2013-09-26 10:07:00 1
当数据中存在相同的ID和var时,我想删除包含最早TIME值的行,即。最终得到这样的东西:
ID2 = c(1,1,2,2,3,3,4,4)
TIME2 = as.POSIXct(c("2013-09-26 10:07:00","2013-09-26 12:07:00","2013-03-31 09:10:00",
"2013-03-31 11:11:00","2013-09-26 09:04:00","2013-09-26 09:07:00",
"2013-09-26 09:07:00","2013-09-26 10:07:00"))
var2 = c(0,1,0,1,0,1,0,1)
DF2 = data.frame(ID2, TIME2, var2)
ID2 TIME2 var2
1 1 2013-09-26 10:07:00 0
2 1 2013-09-26 12:07:00 1
3 2 2013-03-31 09:10:00 0
4 2 2013-03-31 11:11:00 1
5 3 2013-09-26 09:04:00 0
6 3 2013-09-26 09:07:00 1
7 4 2013-09-26 09:07:00 0
8 4 2013-09-26 10:07:00 1
正如您所看到的,这不仅仅是避免在2013年3月进行的测量,因为这些测量是有效的。它只是有重复的测量结果"并且已经在9月再次进行了应该受到影响(参见例如ID = 2仍然在DF2中)。
希望你能提供帮助。
Sincerily, ykl
答案 0 :(得分:1)
这是dplyr的选项:
library(dplyr)
DF %>% group_by(ID, var) %>% filter(n() == 1L | !TIME %in% min(TIME))
#Source: local data frame [8 x 3]
#Groups: ID, var
#
# ID TIME var
#1 1 2013-09-26 10:07:00 0
#2 1 2013-09-26 12:07:00 1
#3 2 2013-03-31 09:10:00 0
#4 2 2013-03-31 11:11:00 1
#5 3 2013-09-26 09:04:00 0
#6 3 2013-09-26 09:07:00 1
#7 4 2013-09-26 09:07:00 0
#8 4 2013-09-26 10:07:00 1
这是做什么的:
n() == 1L
,则始终返回该行。
2)如果组有多于1行,即n() > 1L
,则检查TIME
值是否为!
等于组的最小(earlist)TIME值。通过使用|
,我们否定向量,以便在TIME达到最小值时它为FALSE。那些1)和2)条件与OR({{1}})组合。答案 1 :(得分:1)
使用data.table
library(data.table)
setDT(DF)[ ,{if(.N==1) .SD else .SD[-which.min(TIME)]}, by=list(ID, var)]
# ID var TIME
#1: 1 0 2013-09-26 10:07:00
#2: 1 1 2013-09-26 12:07:00
#3: 2 0 2013-03-31 09:10:00
#4: 2 1 2013-03-31 11:11:00
#5: 3 0 2013-09-26 09:04:00
#6: 3 1 2013-09-26 09:07:00
#7: 4 0 2013-09-26 09:07:00
#8: 4 1 2013-09-26 10:07:00
或者@docendo discimus
显示的类似逻辑方法setDT(DF)[DF[,.N==1L|!TIME %in% min(TIME), by=list(ID, var)]$V1]