选择每个ID号最近的日期

时间:2018-02-18 00:07:35

标签: r

我有df这样:

df <- data.frame(CustomerID = c(1, 1, 1, 2, 2, 2), 
                 Year = c(2012-02-03, 2012-03-05, 2013-10-22, 2014-03-02, 2015-02-19, 2016-11-20))

我想在每个CustomerID中选择最近的日期。

我期望的结果是:

CustomerID  Latest Date
1           2013-10-22
2           2016-11-20

1 个答案:

答案 0 :(得分:0)

dplyrlubridate的一种方式:

library(dplyr)
library(lubridate)

df %>% 
  mutate(Year = as_date(Year)) %>% # Convert Year from character to Date (optional)
  group_by(CustomerID) %>%         # Group by CustomerID
  filter(Year == max(Year)) %>%    # Filter Year to return max Year of each group
  ungroup()                        # Ungroup

可替换地:

df %>% 
  mutate(Year = ymd(Year)) %>% # Convert Year from character to Date (optional)
  group_by(CustomerID) %>%     # Group by CustomerID
  arrange(desc(Year)) %>%      # Arrange Year in descending order
  slice(1) %>%                 # Slice/Take the first row of each group
  ungroup()                    # Ungroup

两者都返回:

# A tibble: 2 x 2
  CustomerID Year      
       <dbl> <date>    
1       1.00 2013-10-22
2       2.00 2016-11-20

我使用的数据。请注意,我通过添加stringsAsFactors = FALSE并生成Year var字符来稍微改变它。如果你没有在Year的日期附近加上引号,那就搞砸了:

df <- data.frame(CustomerID = c(1, 1, 1, 2, 2, 2), 
                 Year = c("2012-02-03", 
                          "2012-03-05", 
                          "2013-10-22", 
                          "2014-03-02", 
                          "2015-02-19", 
                          "2016-11-20"),
                 stringsAsFactors = FALSE)