以下是带有出勤时间表的数据集示例。我想保留最早的punch_in和最后一个punch_out的记录(即id-1,name-sam,punch_in -8 / 6/2015 8:00:00和punch_out- 8/6/2015 16:05:00) 。如何删除R中的其他重复条目?
id<-c(1,1,1,1,2,3,4)
name<-c("sam","sam","sam","sam","jack","john","jude")
sex<-c("M","M","M","M","M","M","F")
punch_in<-c("8/6/2015 8:00:00","8/6/2015 8:05:00","8/6/2015 8:00:00","8/6/2015 8:05:00","8/6/2015 8:06:00","8/6/2015 7:59:00","8/6/2015 8:00:00")
punch_out<-c("8/6/2015 16:00:00","8/6/2015 16:00:00","8/6/2015 16:05:00","8/6/2015 16:05:00","8/6/2015 16:00:00","8/6/2015 16:05:00","8/6/2015 16:05:00")
data<-as.data.frame(cbind(id,name,sex,punch_in,punch_out))
答案 0 :(得分:1)
id<-c(1,1,1,1,2,3,4)
name<-c("sam","sam","sam","sam","jack","john","jude")
sex<-c("M","M","M","M","M","M","F")
punch_in<-c("8/6/2015 8:00:00","8/6/2015 8:05:00","8/6/2015 8:00:00","8/6/2015 8:05:00","8/6/2015 8:06:00","8/6/2015 7:59:00","8/6/2015 8:00:00")
punch_out<-c("8/6/2015 16:00:00","8/6/2015 16:00:00","8/6/2015 16:05:00","8/6/2015 16:05:00","8/6/2015 16:00:00","8/6/2015 16:05:00","8/6/2015 16:05:00")
data<-as.data.frame(cbind(id,name,sex,punch_in,punch_out))
library(dplyr)
data %>%
group_by(id, name, sex) %>% # for each combination of id, name, sex
summarise(punch_in = first(punch_in), # keep the first punch in
punch_out = last(punch_out)) %>% # keep the last punch out
ungroup() # forget the grouping
# # A tibble: 4 x 5
# id name sex punch_in punch_out
# <fct> <fct> <fct> <fct> <fct>
# 1 1 sam M 8/6/2015 8:00:00 8/6/2015 16:05:00
# 2 2 jack M 8/6/2015 8:06:00 8/6/2015 16:00:00
# 3 3 john M 8/6/2015 7:59:00 8/6/2015 16:05:00
# 4 4 jude F 8/6/2015 8:00:00 8/6/2015 16:05:00
这假设行按日期排序,因此对于每个id,第一个是最早的,最后一个是最新的。