我有df
这样:
df <- data.frame(CustomerID = c(1, 1, 1, 2, 2, 2),
Year = c(2012-02-03, 2012-03-05, 2013-10-22, 2014-03-02, 2015-02-19, 2016-11-20))
我想在每个CustomerID
中选择最近的日期。
我期望的结果是:
CustomerID Latest Date
1 2013-10-22
2 2016-11-20
答案 0 :(得分:0)
dplyr
和lubridate
的一种方式:
library(dplyr)
library(lubridate)
df %>%
mutate(Year = as_date(Year)) %>% # Convert Year from character to Date (optional)
group_by(CustomerID) %>% # Group by CustomerID
filter(Year == max(Year)) %>% # Filter Year to return max Year of each group
ungroup() # Ungroup
可替换地:
df %>%
mutate(Year = ymd(Year)) %>% # Convert Year from character to Date (optional)
group_by(CustomerID) %>% # Group by CustomerID
arrange(desc(Year)) %>% # Arrange Year in descending order
slice(1) %>% # Slice/Take the first row of each group
ungroup() # Ungroup
两者都返回:
# A tibble: 2 x 2
CustomerID Year
<dbl> <date>
1 1.00 2013-10-22
2 2.00 2016-11-20
我使用的数据。请注意,我通过添加stringsAsFactors = FALSE
并生成Year
var字符来稍微改变它。如果你没有在Year
的日期附近加上引号,那就搞砸了:
df <- data.frame(CustomerID = c(1, 1, 1, 2, 2, 2),
Year = c("2012-02-03",
"2012-03-05",
"2013-10-22",
"2014-03-02",
"2015-02-19",
"2016-11-20"),
stringsAsFactors = FALSE)