提取数据框R中特定因子的最大和最小日期

时间:2015-10-12 21:54:00

标签: r

我在数据框中有这样的东西:

PersonId Date_Withdrawal
       A      2012-05-01   
       A      2012-06-01
       B      2012-05-01
       C      2012-05-01
       A      2012-07-01
       A      2012-10-01
       B      2012-08-01
       B      2012-12-01
       C      2012-07-01

我想在另一个数据框中获取A,B,C等的最大日期和分钟日期

1 个答案:

答案 0 :(得分:5)

首先,转换为正确的日期类(总是一个好习惯),然后您可以按组运行简单的range。这是一次尝试

library(data.table)
setDT(df)[, Date_Withdrawal := as.IDate(Date_Withdrawal)]
df[, as.list(range(Date_Withdrawal)), by = PersonId]
#    PersonId         V1         V2
# 1:        A 2012-05-01 2012-10-01
# 2:        B 2012-05-01 2012-12-01
# 3:        C 2012-05-01 2012-07-01

或者

library(dplyr)
df %>%
  mutate(Date_Withdrawal = as.Date(Date_Withdrawal)) %>%
  group_by(PersonId) %>%
  summarise(Min = min(Date_Withdrawal), Max = max(Date_Withdrawal))
# Source: local data frame [3 x 3]
# 
#  PersonId        Min        Max
#    (fctr)     (date)     (date)
# 1        A 2012-05-01 2012-10-01
# 2        B 2012-05-01 2012-12-01
# 3        C 2012-05-01 2012-07-01

P.S。 base aggregate看起来像aggregate(as.Date(Date_Withdrawal) ~ PersonId, df, range)但它拒绝保留类。