在R中,如何使案例保持最早和最新的记录

时间:2015-07-15 17:49:56

标签: r

我正在寻找R的解决方案。

我有一个数据框:

enter image description here

df <- data.frame(
  Number = rep(1:5, c(1, 1, 2, 5, 2)),
  Date = c("4/8/2010", "4/8/2010","4/15/2010", "4/21/2010",
           "4/24/2010", "6/9/2010", "6/2/2010","6/25/2010",
           "6/30/2010", "7/9/2010", "7/28/2010"),
  Time = c("15:00:00", "16:00:00", "10:30:00","16:15:00",
           "11:30:00", "12:00:00", "11:00:00", "10:30:00",
           "09:07:44", "08:49:43", "08:33:55"),
  Status = c("A", NA, NA, "B", NA, "B", 
             NA, NA, "C", NA, "C"),
  stringsAsFactors = FALSE)

根据唯一的“数字”列,如何选择最早和最晚的日期(有时最新日期相同,但时间不同),并选择最后(最新)状态。

理想的结果将是:

enter image description here

非常感谢。

1 个答案:

答案 0 :(得分:3)

## NA will cause problems later, so set to 0 first
df$Status[is.na(df$Status)] <- 0
## Get earliest and latest date time
earliest <- aggregate(cbind(Date, Time, Status) ~ Number, data=df, function(x){min(as.character(x))})
latest <- aggregate(cbind(Date, Time, Status) ~ Number, data=df, function(x){max(as.character(x))})

## merge two data frames by Number
output <- merge(earliest, latest, all=TRUE, by="Number")

## Set Status to nonzero observations
output$Status <- ifelse(output$Status.x!=0, output$Status.x, output$Status.y)
## Remove redundant last date
output$LastDate <- ifelse(output$Date.x==output$Date.y & output$Time.x==output$Time.y, "", output$Date.y)
## Remove redundant last time
output$LastTime <- ifelse(output$Date.x==output$Date.y & output$Time.x==output$Time.y, "", output$Time.y)

## Select relevant output
final <- subset(output, select=c(Number, Date.x, Time.x, LastDate, LastTime, Status))
## Rename columns
names(final)[2:3] <- c("FirstDate", "FirstTime")
## Set Status back to NA
final$Status[final$Status==0] <- NA

最终输出与您所描述的相似:

> final
  Number FirstDate FirstTime  LastDate LastTime Status
1      1  4/8/2010  15:00:00                         A
2      2  4/8/2010  16:00:00                      <NA>
3      3 4/15/2010  10:30:00 4/21/2010 16:15:00      B
4      4 4/24/2010  09:07:44  6/9/2010 12:00:00      C
5      5 7/28/2010  08:33:55  7/9/2010 08:49:43      C