根据两列中的条件集删除重复项(时间数据)

时间:2018-05-23 16:44:13

标签: r datetime duplicates

以下是带有出勤时间表的数据集示例。我想保留最早的punch_in和最后一个punch_out的记录(即id-1,name-sam,punch_in -8 / 6/2015 8:00:00和punch_out- 8/6/2015 16:05:00) 。如何删除R中的其他重复条目?

id<-c(1,1,1,1,2,3,4)
name<-c("sam","sam","sam","sam","jack","john","jude")
sex<-c("M","M","M","M","M","M","F")
punch_in<-c("8/6/2015 8:00:00","8/6/2015 8:05:00","8/6/2015 8:00:00","8/6/2015 8:05:00","8/6/2015 8:06:00","8/6/2015 7:59:00","8/6/2015 8:00:00")
punch_out<-c("8/6/2015 16:00:00","8/6/2015 16:00:00","8/6/2015 16:05:00","8/6/2015 16:05:00","8/6/2015 16:00:00","8/6/2015 16:05:00","8/6/2015 16:05:00")
data<-as.data.frame(cbind(id,name,sex,punch_in,punch_out))

1 个答案:

答案 0 :(得分:1)

id<-c(1,1,1,1,2,3,4)
name<-c("sam","sam","sam","sam","jack","john","jude")
sex<-c("M","M","M","M","M","M","F")
punch_in<-c("8/6/2015 8:00:00","8/6/2015 8:05:00","8/6/2015 8:00:00","8/6/2015 8:05:00","8/6/2015 8:06:00","8/6/2015 7:59:00","8/6/2015 8:00:00")
punch_out<-c("8/6/2015 16:00:00","8/6/2015 16:00:00","8/6/2015 16:05:00","8/6/2015 16:05:00","8/6/2015 16:00:00","8/6/2015 16:05:00","8/6/2015 16:05:00")
data<-as.data.frame(cbind(id,name,sex,punch_in,punch_out))

library(dplyr)

data %>%
  group_by(id, name, sex) %>%                 # for each combination of id, name, sex
  summarise(punch_in = first(punch_in),       # keep the first punch in
            punch_out = last(punch_out)) %>%  # keep the last punch out
  ungroup()                                   # forget the grouping

# # A tibble: 4 x 5
#   id    name  sex   punch_in         punch_out        
#   <fct> <fct> <fct> <fct>            <fct>            
# 1 1     sam   M     8/6/2015 8:00:00 8/6/2015 16:05:00
# 2 2     jack  M     8/6/2015 8:06:00 8/6/2015 16:00:00
# 3 3     john  M     8/6/2015 7:59:00 8/6/2015 16:05:00
# 4 4     jude  F     8/6/2015 8:00:00 8/6/2015 16:05:00

这假设行按日期排序,因此对于每个id,第一个是最早的,最后一个是最新的。