子集基于日期范围

时间:2017-06-26 10:03:50

标签: r subset

我有一个数据集(许多玩家都有玩家名称,玩家评分和评级发布日期。例如。

Player  date           overall_rating
Aaron Cresswell 4/21/2016       74
Aaron Cresswell 12/5/2014       71
Aaron Cresswell 11/7/2014       71
Aaron Cresswell 9/18/2014       70
Aaron Cresswell 5/2/2014        70
Aaron Cresswell 4/4/2014        70
Aaron Cresswell 3/14/2014       70
Aaron Cresswell 12/13/2013      70
Aaron Cresswell 11/8/2013       70
Aaron Cresswell 10/4/2013       69
Aaron Cresswell 9/20/2013       69
Aaron Cresswell 5/3/2013        69
Aaron Cresswell 3/22/2013       69
Aaron Cresswell 3/15/2013       69
Aaron Cresswell 2/22/2013       69
Aaron Cresswell 2/15/2013       69
Aaron Cresswell 8/31/2012       68
Aaron Cresswell 2/22/2012       65
Aaron Cresswell 8/30/2011       64
Aaron Cresswell 8/30/2010       54
Aaron Cresswell 2/22/2010       51
Aaron Cresswell 8/30/2009       52
Aaron Cresswell 2/22/2009       47
Aaron Cresswell 8/30/2008       53
Aaron Cresswell 2/22/2007       53
Aaron Doran 1/7/2016        65
Aaron Doran 10/9/2015       66
Aaron Doran 9/21/2015       66
Aaron Doran 12/12/2014      67
Aaron Doran 9/18/2014       68
Aaron Doran 4/18/2014       68
Aaron Doran 3/14/2014       68
Aaron Doran 1/31/2014       69
Aaron Doran 11/29/2013      70
Aaron Doran 9/20/2013       71
Aaron Doran 5/31/2013       70
Aaron Doran 4/26/2013       70
Aaron Doran 4/19/2013       70
Aaron Doran 4/5/2013        70
Aaron Doran 3/22/2013       69
Aaron Doran 3/8/2013        69
Aaron Doran 2/15/2013       69
Aaron Doran 8/31/2012       65
Aaron Doran 2/22/2012       65
Aaron Doran 8/30/2011       65
Aaron Doran 2/22/2011       67
Aaron Doran 8/30/2010       67
Aaron Doran 2/22/2010       65
Aaron Doran 8/30/2009       65
Aaron Doran 2/22/2009       59
Aaron Doran 2/22/2007       59
Aaron Hughes    12/24/2015      70
Aaron Hughes    9/21/2015       70
Aaron Hughes    5/8/2015        69
Aaron Hughes    4/10/2015       69
Aaron Hughes    3/20/2015       70
Aaron Hughes    9/18/2014       72
Aaron Hughes    1/31/2014       72
Aaron Hughes    1/17/2014       72
Aaron Hughes    9/20/2013       73
Aaron Hughes    5/10/2013       73
Aaron Hughes    4/26/2013       74
Aaron Hughes    3/22/2013       74
Aaron Hughes    3/8/2013        74
Aaron Hughes    2/15/2013       74
Aaron Hughes    8/31/2012       74
Aaron Hughes    2/22/2012       75

我的问题是:如何执行此操作:如果日期介于(例如2006年8月1日至2007年5月30日)之间,则在名为Season的新列中,它应显示为“2006/2007”。因为一个玩家可以在一个赛季中获得多个评分,我想为每个玩家保留每个赛季的最后一个评分。

2 个答案:

答案 0 :(得分:0)

您可以使用lubridate:

  library(lubridate) 
  library(data.table)
  start_date<-ymd("2006/08/01")
  end_date<-ymd("2007/05/30")

如果df是您的初始数据框,则:

  df$date<-dmy(df$date)#make sure you don't get NA

最后你可以通过以下方式添加季节:

df$Season <-ifelse(between(df$date,start_date,end_date),paste0(year(start_date),"/",year(end_date)),"")

>df
   player       date rating    Season
 1 player1 2006-09-12      a 2006/2007
 2 player1 2007-08-01      b          
 3 player2 2007-07-03      c          

修改

对于更通用的解决方案(数据框包含多年):

player<-c("player1","player1","player2","player2","player1")
date<-c( "12/09/2006","01/08/2007","03/07/2007","25/05/2015","05/04/2016")
rating<-c("a","b","c","d","a")
df<-data.frame(player,date,rating)
df$date<-dmy(df$date)#make sure you don't get NA

#dynamic dates (based on years) 
df$start_date<-ymd(paste0(year(df$date)-1,"/08/01"))
df$end_date<-ymd(paste0(year(df$date),"/05/30"))


df$Season <- ifelse(between(df$date,df$start_date,df$end_date),paste0(year(df$start_date),"/",year(df$end_date)),paste0(year(df$start_date)+1,"/",year(df$end_date)+1))

导致:

>df
   player       date rating start_date   end_date    Season
1 player1 2006-09-12      a 2005-08-01 2006-05-30 2006/2007
2 player1 2007-08-01      b 2006-08-01 2007-05-30 2007/2008  
3 player2 2007-07-03      c 2006-08-01 2007-05-30 2007/2008
4 player2 2015-05-25      d 2014-08-01 2015-05-30 2014/2015
5 player1 2016-04-05      a 2015-08-01 2016-05-30 2015/2016 

答案 1 :(得分:0)

以下是使用dplyrlubridate执行此操作的方法。基本上,您想要创建Season列。如果评分的month小于或等于5,您希望季节为year - 1 / year。否则,季节将为year / year = + 1。然后,您可以group_by玩家和季节,并选择slice(n())

的最后评分
library(dplyr);library(lubridate)
df%>%
  mutate(date=as.Date(date,"%m/%d/%Y"),
         Season=ifelse(month(date)<=5,paste(year(date)-1,year(date),sep="/"),
                       paste(year(date),year(date)+1,sep="/")))  %>%
  arrange(date)%>%
  group_by(Player,Season)%>%
  slice(n())

            Player       date overall_rating    Season
             <chr>     <date>          <int>     <chr>
1  Aaron Cresswell 2007-02-22             53 2006/2007
2  Aaron Cresswell 2009-02-22             47 2008/2009
3  Aaron Cresswell 2010-02-22             51 2009/2010
4  Aaron Cresswell 2010-08-30             54 2010/2011
5  Aaron Cresswell 2012-02-22             65 2011/2012
6  Aaron Cresswell 2013-05-03             69 2012/2013
7  Aaron Cresswell 2014-05-02             70 2013/2014
8  Aaron Cresswell 2014-12-05             71 2014/2015
9  Aaron Cresswell 2016-04-21             74 2015/2016
10     Aaron Doran 2007-02-22             59 2006/2007
11     Aaron Doran 2009-02-22             59 2008/2009
12     Aaron Doran 2010-02-22             65 2009/2010
13     Aaron Doran 2011-02-22             67 2010/2011
14     Aaron Doran 2012-02-22             65 2011/2012
15     Aaron Doran 2013-05-31             70 2012/2013
16     Aaron Doran 2014-04-18             68 2013/2014
17     Aaron Doran 2014-12-12             67 2014/2015
18     Aaron Doran 2016-01-07             65 2015/2016
19    Aaron Hughes 2012-02-22             75 2011/2012
20    Aaron Hughes 2013-05-10             73 2012/2013
21    Aaron Hughes 2014-01-31             72 2013/2014
22    Aaron Hughes 2015-05-08             69 2014/2015
23    Aaron Hughes 2015-12-24             70 2015/2016

数据

df <- read.table(text='Player  date           overall_rating
"Aaron Cresswell" 4/21/2016       74
"Aaron Cresswell" 12/5/2014       71
"Aaron Cresswell" 11/7/2014       71
"Aaron Cresswell" 9/18/2014       70
"Aaron Cresswell" 5/2/2014        70
"Aaron Cresswell" 4/4/2014        70
"Aaron Cresswell" 3/14/2014       70
"Aaron Cresswell" 12/13/2013      70
"Aaron Cresswell" 11/8/2013       70
"Aaron Cresswell" 10/4/2013       69
"Aaron Cresswell" 9/20/2013       69
"Aaron Cresswell" 5/3/2013        69
"Aaron Cresswell" 3/22/2013       69
"Aaron Cresswell" 3/15/2013       69
"Aaron Cresswell" 2/22/2013       69
"Aaron Cresswell" 2/15/2013       69
"Aaron Cresswell" 8/31/2012       68
"Aaron Cresswell" 2/22/2012       65
"Aaron Cresswell" 8/30/2011       64
"Aaron Cresswell" 8/30/2010       54
"Aaron Cresswell" 2/22/2010       51
"Aaron Cresswell" 8/30/2009       52
"Aaron Cresswell" 2/22/2009       47
"Aaron Cresswell" 8/30/2008       53
"Aaron Cresswell" 2/22/2007       53
"Aaron Doran" 1/7/2016        65
"Aaron Doran" 10/9/2015       66
"Aaron Doran" 9/21/2015       66
"Aaron Doran" 12/12/2014      67
"Aaron Doran" 9/18/2014       68
"Aaron Doran" 4/18/2014       68
"Aaron Doran" 3/14/2014       68
"Aaron Doran" 1/31/2014       69
"Aaron Doran" 11/29/2013      70
"Aaron Doran" 9/20/2013       71
"Aaron Doran" 5/31/2013       70
"Aaron Doran" 4/26/2013       70
"Aaron Doran" 4/19/2013       70
"Aaron Doran" 4/5/2013        70
"Aaron Doran" 3/22/2013       69
"Aaron Doran" 3/8/2013        69
"Aaron Doran" 2/15/2013       69
"Aaron Doran" 8/31/2012       65
"Aaron Doran" 2/22/2012       65
"Aaron Doran" 8/30/2011       65
"Aaron Doran" 2/22/2011       67
"Aaron Doran" 8/30/2010       67
"Aaron Doran" 2/22/2010       65
"Aaron Doran" 8/30/2009       65
"Aaron Doran" 2/22/2009       59
"Aaron Doran" 2/22/2007       59
"Aaron Hughes"    12/24/2015      70
"Aaron Hughes"    9/21/2015       70
"Aaron Hughes"    5/8/2015        69
"Aaron Hughes"    4/10/2015       69
"Aaron Hughes"    3/20/2015       70
"Aaron Hughes"    9/18/2014       72
"Aaron Hughes"    1/31/2014       72
"Aaron Hughes"    1/17/2014       72
"Aaron Hughes"    9/20/2013       73
"Aaron Hughes"    5/10/2013       73
"Aaron Hughes"    4/26/2013       74
"Aaron Hughes"    3/22/2013       74
"Aaron Hughes"    3/8/2013        74
"Aaron Hughes"    2/15/2013       74
"Aaron Hughes"    8/31/2012       74
"Aaron Hughes"    2/22/2012       75',header=TRUE,stringsAsFactors=FALSE)