我的日期如下所示:
dat<-data.frame(ID=c("A","B","B",NA,"C"),Date=as.Date(c("2012-06-06","2012-07-07","2014-07-07",NA,NA)),stringsAsFactors=FALSE)
print(dat)
ID Date
A 2012-06-06
B 2012-07-07
B 2014-07-07
<NA> <NA>
C <NA>
我试图保留ID
的最新实例,而不删除任何NAs以获得类似的内容:
dat1<-data.frame(ID=c("A","B",NA,"C"),Date=as.Date(c("2012-06-06","2014-07-07",NA,NA)),stringsAsFactors=FALSE)
print(dat1)
ID Date
A 2012-06-06
B 2014-07-07
<NA> <NA>
C <NA>
我从dplyr
尝试了以下内容:
library(dplyr)
dat1<-dat%>%group_by(ID)%>%filter(Date==max(Date&!is.na(Date)))
dat1<-dat%>%group_by(ID)%>%filter(Date==max(Date,na.rm=TRUE))
第一个产生错误,第二个仍然删除NA。有什么建议吗?
答案 0 :(得分:3)
使用data.table
:
library(data.table)
setDT(dat)
dat[, max_date := max(Date), by = ID]
dat <- dat[!(is.na(Date)) & Date == max_date | is.na(Date), ]
dat[, max_date := NULL]
输出:
ID Date
1: A 2012-06-06
2: B 2014-07-07
3: NA <NA>
4: C <NA>
答案 1 :(得分:1)
一个简单的解决方案:
dat<-dat[order(as.Date(dat$Date),na.last = T,decreasing = T),]
dat<-dat[!duplicated(dat$ID), ]
dat[ order(row.names(dat)), ]
ID Date
1 A 2012-06-06
3 B 2014-07-07
4 <NA> <NA>
5 C <NA>
答案 2 :(得分:1)
<强>基强>
dat$ID <- addNA(dat$ID)
dat <- dat[order(dat$Date, decreasing = TRUE),]
aggregate( Date ~ID, dat , FUN = head, 1, na.action = na.pass)
dplyr
在slice
中使用dplyr
非常简洁:
dat %>%
group_by(ID) %>%
arrange(desc(Date)) %>%
slice(1)
<强>输出强>
# A tibble: 4 x 2
# Groups: ID [4]
ID Date
<chr> <date>
1 A 2012-06-06
2 B 2014-07-07
3 C NA
4 NA NA
答案 3 :(得分:0)
使用dplyr
:
dat <-
data.frame(
ID = c("A", "B", "B", NA, "C"),
Date = as.Date(c(
"2012-06-06", "2012-07-07", "2014-07-07", NA, NA
)),
stringsAsFactors = FALSE
)
df <- dat %>%
arrange(ID, desc(Date)) %>%
group_by(ID) %>%
filter(row_number() == 1)
输出:
# A tibble: 4 x 2
ID Date
<chr> <date>
1 A 2012-06-06
2 B 2014-07-07
3 C NA
4 <NA> NA
答案 4 :(得分:0)
您可以尝试:
library(dplyr)
dat %>%
group_by(ID) %>%
mutate(latest = ifelse(Date == max(Date), 1L, 0L)) %>%
filter(is.na(latest) | latest == 1) %>%
select( -latest)
Result:
# A tibble: 4 x 3
# Groups: ID [4]
ID Date
<chr> <date>
1 A 2012-06-06
2 B 2014-07-07
3 <NA> NA
4 C NA