使用Lubridate和Dplyr基于时间段创建子组

时间:2016-10-24 21:44:31

标签: r dplyr lubridate

这应该是一个快速简单的问题。使用下面的简单数据框,我想使用dplyr和lubridate将所有在2015年4月或之后拥有OnsetDate的客户端分组。这个组将被称为“NewOnset”,其余的将是“OldOnset”。

我是新手,并且遇到麻烦。

a*exp(-b*x)+c

4 个答案:

答案 0 :(得分:1)

无需使用外部包来完成这项简单的任务。在基地R:

## coerce character to a valid date
DF$OnsetDate <- as.Date(DF$OnsetDate ,"%m/%d/%Y")
## flter rows
DF[DF$OnsetDate>"2015-04-30",]

#    Client     City  OnsetDate
# 4     Cl4   Ottawa 2015-07-10
# 6     Cl6 Hamilton 2016-03-11
# 8     Cl8  Toronto 2015-06-10
# 10   Cl10 Hamilton 2016-08-08

答案 1 :(得分:1)

您可以在没有dplyr功能的情况下执行此操作。 Lubridate的函数系列以您转换为日期的对象的格式命名。在这种情况下,您希望使用mdy函数,因为输入格式为月 - 日 - 年。

DF$OnsetDate <- mdy(DF$OnsetDate)

然后,您可以根据您的条件对行进行分项来创建新的数据框。

NewOnset <- DF[DF$OnsetDate >= as.Date("2015-04-01"), ]
OldOnset <- DF[DF$OnsetDate < as.Date("2015-04-01"), ]

答案 2 :(得分:1)

A couple of issues with your code. This should fix it:

City <- c("Toronto", "Toronto", "Montreal", "Ottawa", "Ottawa", "Hamilton", "Peterborough", "Toronto", "Hamilton", "Hamilton")
OnsetDate <- c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")
Client <- c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")

df <- data.frame(Client, City, OnsetDate)

df$OnsetDate <- as.Date(df$OnsetDate, format = "%m/%d/%Y")    

# here comes the magic
df %>% filter(OnsetDate > as.Date("04/01/2015", format = "%m/%d/%Y"))

You can use the format parameter, and there's no real need for the lubridate package here. The above code yields:

  Client     City  OnsetDate
1    Cl3 Montreal 2015-04-19
2    Cl4   Ottawa 2015-07-10
3    Cl6 Hamilton 2016-03-11
4    Cl8  Toronto 2015-06-10
5   Cl10 Hamilton 2016-08-08

答案 3 :(得分:0)

使用dplyr,

       # parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>% 
    # add and group by new column
    group_by(group = if_else(OnsetDate >= as.Date('2015-04-01'),    # condition
                             'NewOnset',    # return if above (true)
                             'OldOnset'))   # return if below (false)

## Source: local data frame [10 x 4]
## Groups: group [2]
## 
##    Client         City  OnsetDate    group
##    <fctr>       <fctr>     <date>    <chr>
## 1     Cl1      Toronto 1980-11-04 OldOnset
## 2     Cl2      Toronto 2005-04-08 OldOnset
## 3     Cl3     Montreal 2015-04-19 NewOnset
## 4     Cl4       Ottawa 2015-07-10 NewOnset
## 5     Cl5       Ottawa 1999-10-10 OldOnset
## 6     Cl6     Hamilton 2016-03-11 NewOnset
## 7     Cl7 Peterborough 2011-09-12 OldOnset
## 8     Cl8      Toronto 2015-06-10 NewOnset
## 9     Cl9     Hamilton 1988-02-05 OldOnset
## 10   Cl10     Hamilton 2016-08-08 NewOnset

请注意,此处的分组不会执行任何操作,您可以在mutate中执行这两项操作,但您确实可以获得适合进一步突变或摘要的分组数据。

另一种方法是使用cut.Date,它将返回一个因子:

# parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>% 
    # add and group by new column
    group_by(group = cut(OnsetDate, 
                         breaks = c(min(OnsetDate), as.Date('2015-04-01'), max(OnsetDate)), 
                         labels = c('OldOnset', 'NewOnset'), 
                         include.lowest = TRUE))

## Source: local data frame [10 x 4]
## Groups: group [2]
## 
##    Client         City  OnsetDate    group
##    <fctr>       <fctr>     <date>   <fctr>
## 1     Cl1      Toronto 1980-11-04 OldOnset
## 2     Cl2      Toronto 2005-04-08 OldOnset
## 3     Cl3     Montreal 2015-04-19 NewOnset
## 4     Cl4       Ottawa 2015-07-10 NewOnset
## 5     Cl5       Ottawa 1999-10-10 OldOnset
## 6     Cl6     Hamilton 2016-03-11 NewOnset
## 7     Cl7 Peterborough 2011-09-12 OldOnset
## 8     Cl8      Toronto 2015-06-10 NewOnset
## 9     Cl9     Hamilton 1988-02-05 OldOnset
## 10   Cl10     Hamilton 2016-08-08 NewOnset