最近一个日期的组ID,同时在R中保留NA

时间:2018-01-25 16:41:48

标签: r date dataframe

我的日期如下所示:

dat<-data.frame(ID=c("A","B","B",NA,"C"),Date=as.Date(c("2012-06-06","2012-07-07","2014-07-07",NA,NA)),stringsAsFactors=FALSE)
print(dat)
 ID       Date
 A    2012-06-06
 B    2012-07-07
 B    2014-07-07
<NA>     <NA>
 C       <NA>

我试图保留ID的最新实例,而不删除任何NAs以获得类似的内容:

dat1<-data.frame(ID=c("A","B",NA,"C"),Date=as.Date(c("2012-06-06","2014-07-07",NA,NA)),stringsAsFactors=FALSE)
print(dat1)
  ID       Date
  A    2012-06-06
  B    2014-07-07
<NA>      <NA>
  C       <NA>

我从dplyr尝试了以下内容:

library(dplyr)
dat1<-dat%>%group_by(ID)%>%filter(Date==max(Date&!is.na(Date)))
dat1<-dat%>%group_by(ID)%>%filter(Date==max(Date,na.rm=TRUE))

第一个产生错误,第二个仍然删除NA。有什么建议吗?

5 个答案:

答案 0 :(得分:3)

使用data.table

library(data.table)
setDT(dat)
dat[, max_date := max(Date), by = ID]
dat <- dat[!(is.na(Date)) & Date == max_date | is.na(Date), ]
dat[, max_date := NULL]

输出:

ID       Date
1:  A 2012-06-06
2:  B 2014-07-07
3: NA       <NA>
4:  C       <NA>

答案 1 :(得分:1)

一个简单的解决方案:

dat<-dat[order(as.Date(dat$Date),na.last = T,decreasing = T),]
dat<-dat[!duplicated(dat$ID), ]
dat[ order(row.names(dat)), ]
    ID       Date
1    A 2012-06-06
3    B 2014-07-07
4 <NA>       <NA>
5    C       <NA>

答案 2 :(得分:1)

<强>基

dat$ID <-  addNA(dat$ID)
dat <- dat[order(dat$Date, decreasing = TRUE),]
aggregate( Date ~ID, dat , FUN = head, 1, na.action = na.pass)

dplyr

slice中使用dplyr非常简洁:

dat %>%  
  group_by(ID) %>%
  arrange(desc(Date)) %>% 
  slice(1)

<强>输出

# A tibble: 4 x 2
# Groups: ID [4]
  ID    Date      
  <chr> <date>    
1 A     2012-06-06
2 B     2014-07-07
3 C     NA        
4 NA    NA  

答案 3 :(得分:0)

使用dplyr

dat <-
  data.frame(
    ID = c("A", "B", "B", NA, "C"),
    Date = as.Date(c(
      "2012-06-06", "2012-07-07", "2014-07-07", NA, NA
    )),
    stringsAsFactors = FALSE
  )

df <- dat %>%
  arrange(ID, desc(Date)) %>%
  group_by(ID) %>%
  filter(row_number() == 1)

输出:

# A tibble: 4 x 2
     ID       Date
  <chr>     <date>
1     A 2012-06-06
2     B 2014-07-07
3     C         NA
4  <NA>         NA

答案 4 :(得分:0)

您可以尝试:

  library(dplyr)
  dat %>% 
  group_by(ID) %>%
  mutate(latest = ifelse(Date == max(Date), 1L, 0L)) %>%
  filter(is.na(latest) | latest == 1) %>%
select( -latest)

Result:
# A tibble: 4 x 3
# Groups: ID [4]
  ID    Date       
  <chr> <date>      
1 A     2012-06-06   
2 B     2014-07-07    
3 <NA>  NA           
4 C     NA