我想保留一个外部列表:
list <- c("Google", "Yahoo", "Amazon")
数据帧中在第一个时间戳(最旧的时间戳)中记录的值,如下所示:
dframe <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), name = c("Google",
"Google", "Yahoo", "Amazon", "Amazon", "Google", "Amazon"), date = c("2008-11-01",
"2008-11-02", "2008-11-01", "2008-11-04", "2008-11-01", "2008-11-02",
"2008-11-03")), class = "data.frame", row.names = c(NA, -7L))
预期输出是这样:
id name date 1 Google 2008-11-01 1 Yahoo 2008-11-01 1 Amazon 2008-11-04 2 Amazon 2008-11-01 2 Google 2008-11-02
怎么做呢?
使用this,它仅保留每个id的第一条记录,而不保留第一次记录的列表中的每个单个值
library(data.table)
setDT(dframe)
date_list_first = dframe[order(date)][!duplicated(id)]
答案 0 :(得分:5)
使用data.table:
dframe = data.table(dframe)
dframe[, date := as.Date(date)]
dt = dframe[, .(date = min(date)), .(id, name)]
> dt
id name date
1: 1 Google 2008-11-01
2: 1 Yahoo 2008-11-01
3: 1 Amazon 2008-11-04
4: 2 Amazon 2008-11-01
5: 2 Google 2008-11-02
答案 1 :(得分:2)
使用base R
dframe$date <- as.Date(dframe$date)
aggregate(date~ ., dframe, min)
# id name date
#1 1 Amazon 2008-11-04
#2 2 Amazon 2008-11-01
#3 1 Google 2008-11-01
#4 2 Google 2008-11-02
#5 1 Yahoo 2008-11-01
答案 2 :(得分:1)
您可以在dplyr
中做到这一点:
dframe %>% mutate(date = as.Date(date)) %>%
group_by(id, name) %>% summarise(date = min(date)) %>%
ungroup()
没什么好想的,只是分组和总结。
输出
# A tibble: 5 x 3
id name date
<int> <chr> <date>
1 1 Amazon 2008-11-04
2 1 Google 2008-11-01
3 1 Yahoo 2008-11-01
4 2 Amazon 2008-11-01
5 2 Google 2008-11-02