根据第一个时间戳记录保留列表中的值

时间:2019-09-27 14:03:31

标签: r date filter

我想保留一个外部列表:

list <- c("Google", "Yahoo", "Amazon")

数据帧中在第一个时间戳(最旧的时间戳)中记录的值,如下所示:

dframe <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), name = c("Google", 
    "Google", "Yahoo", "Amazon", "Amazon", "Google", "Amazon"), date = c("2008-11-01", 
    "2008-11-02", "2008-11-01", "2008-11-04", "2008-11-01", "2008-11-02", 
    "2008-11-03")), class = "data.frame", row.names = c(NA, -7L))

预期输出是这样:

id   name       date
1 Google 2008-11-01
1  Yahoo 2008-11-01
1 Amazon 2008-11-04
2 Amazon 2008-11-01
2 Google 2008-11-02

怎么做呢?

使用this,它仅保留每个id的第一条记录,而不保留第一次记录的列表中的每个单个值

library(data.table)
setDT(dframe)
date_list_first = dframe[order(date)][!duplicated(id)]

3 个答案:

答案 0 :(得分:5)

使用data.table:

dframe = data.table(dframe)
dframe[, date := as.Date(date)]

dt = dframe[, .(date = min(date)), .(id, name)]

> dt
   id   name       date
1:  1 Google 2008-11-01
2:  1  Yahoo 2008-11-01
3:  1 Amazon 2008-11-04
4:  2 Amazon 2008-11-01
5:  2 Google 2008-11-02

答案 1 :(得分:2)

使用base R

的选项
dframe$date <- as.Date(dframe$date)
aggregate(date~ ., dframe, min)
#  id   name       date
#1  1 Amazon 2008-11-04
#2  2 Amazon 2008-11-01
#3  1 Google 2008-11-01
#4  2 Google 2008-11-02
#5  1  Yahoo 2008-11-01

答案 2 :(得分:1)

您可以在dplyr中做到这一点:

dframe %>% mutate(date = as.Date(date)) %>%
group_by(id, name) %>% summarise(date = min(date)) %>%
ungroup()

没什么好想的,只是分组和总结。

输出

# A tibble: 5 x 3
     id name   date      
  <int> <chr>  <date>    
1     1 Amazon 2008-11-04
2     1 Google 2008-11-01
3     1 Yahoo  2008-11-01
4     2 Amazon 2008-11-01
5     2 Google 2008-11-02