在R

时间:2015-10-31 21:15:59

标签: r function

我知道这里有关于如何将季节应用于数据框的答案,我的问题增加了添加年度和半年的复杂性,当开始​​日期不在典型的季节范围内时

dates <- data.frame(StartDate=as.Date(c("01/01/2013","04/01/2013","10/01/2013","06/01/2013"    
,"09/01/2013","05/01/2013"), format = "%m/%d/%Y"),EndDate=as.Date(c("12/01/2013","12/21/2013
","05/25/2014","08/15/2013","11/30/2013","10/01/2013"),format = "%m/%d/%Y"))


StartDate    EndDate
1 2013-01-01 2013-12-01
2 2013-04-01 2013-12-21
3 2013-10-01 2014-05-25
4 2013-06-01 2013-08-15
5 2013-09-01 2013-11-30
6 2013-05-01 2013-10-01

我需要编写一个添加“季节”列的函数,以便我输出所以我会在现有数据框中添加一个名为season的列

    StartDate    EndDate  Season
1 2013-01-01 2013-12-01   Yearly
2 2013-04-01 2013-12-21   Yearly
3 2013-10-01 2014-05-25   Half-year
4 2013-06-01 2013-08-15   Summer
5 2013-09-01 2013-11-30   Fall
6 2013-05-01 2013-10-01   Half-year

此外,由于日期下降到处都是,我认为如果我删除日期和年份并仅根据月份应用函数然后我将它们转换为数字,可能会简化操作。

dates$StartDate <- format(dates$StartDate, "%m")
dates$EndDate <- format(dates$EndDate, "%m")
dates$StartDate <- as.numeric(dates$StartDate)
dates$StartDate <- as.numeric(dates$StartDate)

 StartDate    EndDate
  1            12
  4            12
 10             5
  6             8
  9            11
  5            10

这是我试图写的功能。我的参数是:如果开始日期等于结束日期那么是年度,如果结束日期 - 开始日期+1 = 12则那是年度,如果结束日期 - 开始日期+1是8-11之间的年度,如果结束日期 - 开始日期如果结束日期 - 开始日期= 6,即半年,然后是基于3个月间隔的季节,则+1大于或等于5但小于8,那么是半年。

如果有更简单的方式,我愿意接受建议。

Seasons <- function(dates) 
{
dates$Season <- NULL
for(i in 1:dim(dates)[1])
{
    if(dates$StartDate[i] == dates$EndDate[i]){
        dates$Season[i] <- "Yearly"  
    }
    if(dates$EndDate[i] - dates$StartDate[i] + 1 == 12){
        dates$Season[i] <- "Yearly" 
    }
    if(dates$EndDate[i] - dates$StartDate[i] == 6){
         dates$Season[i] <- "Half Year"
    }
    if(dates$EndDate[i] - dates$StartDate[i] + 1 >= 5 < 8){
        dates$Season[i] <- "Half Year" 
    }
    if(dates$EndDate[i] - dates$StartDate[i] + 1 >= 8 < 12){
        dates$Season[i] <- "Yearly"
    }
    if(dates$StartDate[i] == 12 & dates$EndDate[i] == 2){
        dates$Season[i] <- "Winter"
    }
    if(dates$StartDate[i] == 3 & dates$EndDate[i] == 5){
        dates$Season[i] <- "Spring"
    }
    if(dates$StartDate[i] == 6 & dates$EndDate[i] == 8){
        dates$Season[i] <- "Summer"
    }
    if(dates$StartDate[i] == 9 & dates$EndDate[i] == 11){
        dates$Season[i] <- "Fall"
    }
return(dates)
}
}

当我运行该功能时,它将“夏天”应用于所有日期。此外,有些行是空白的或没有我想忽略的结束日期。

我也遇到了很多错误,以下是主要错误:

Error: unexpected '<' in:
Error in dates$Season[i] <- "Half Year" : object 'i' not found
Error: unexpected '}' in "        }"

1 个答案:

答案 0 :(得分:0)

您构建的功能应该是这样的:

library(data.table)
Seasons <- function(dates) 
{
  seasons <- rep(NA, nrow(dates))
  for(i in 1:nrow(dates))
  {
    if(is.na(month(dates$StartDate[i])) | is.na(month(dates$EndDate[i]))) next
    if(month(dates$StartDate)[i] == month(dates$EndDate)[i]){
      seasons[i] <- "Yearly"  
    }
    if(abs(month(dates$EndDate)[i] - month(dates$StartDate)[i]) + 1 == 12){
      seasons[i] <- "Yearly" 
    }
    if(abs(month(dates$EndDate)[i] - month(dates$StartDate)[i]) == 6){
      seasons[i] <- "Half Year"
    }
    if(abs(month(dates$EndDate)[i] - month(dates$StartDate)[i] + 1) >= 5 & 
       abs(month(dates$EndDate)[i] - month(dates$StartDate)[i] + 1) < 8){
      seasons[i] <- "Half Year" 
    }
    if(abs(month(dates$EndDate)[i] - month(dates$StartDate)[i]) + 1 >= 8 & 
       abs(month(dates$EndDate)[i] - month(dates$StartDate)[i]) + 1 < 12){
      seasons[i] <- "Yearly"
    }
    if(month(dates$StartDate)[i] == 12 & month(dates$EndDate)[i] == 2){
      seasons[i] <- "Winter"
    }
    if(month(dates$StartDate)[i] == 3 & month(dates$EndDate)[i] == 5){
      seasons[i] <- "Spring"
    }
    if(month(dates$StartDate)[i] == 6 & month(dates$EndDate)[i] == 8){
      seasons[i] <- "Summer"
    }
    if(month(dates$StartDate)[i] == 9 & month(dates$EndDate)[i] == 11){
      seasons[i] <- "Fall"
    }
  }
  return(seasons)
}

然后当你运行它时,它会产生:

> dates$seasons <- Seasons(dates)
> dates
   StartDate    EndDate   seasons
1 2013-01-01 2013-12-01    Yearly
2 2013-04-01 2013-12-21    Yearly
3 2013-10-01 2014-05-25      <NA>
4 2013-06-01 2013-08-15    Summer
5 2013-09-01 2013-11-30      Fall
6 2013-05-01 2013-10-01 Half Year

关于你的功能的一些评论:

  • if(dates$EndDate[i] - dates$StartDate[i] + 1 >= 5 < 8是语法错误。如果您要同时检查><,则需要输入两次表达式。
  • 当您从另一个值中扣除一个值时,请确保使用绝对值,例如5 - 10 + 1 == -4,这显然不是您想要的。
  • 第三个日期不属于您创建的任何类别,因此您需要决定某个新类别
  • 上面的函数返回一个值向量,然后我将其附加到原始data.frame。我觉得这个任务有点清楚了。
  • 由于您已经创建了自己的规则,最好使用自己的功能,因为我认为您不能使用任何可用的软件包。
  • 函数month来自data.table并返回日期的月份数。
  • 在您的函数中return(dates)位于for-loop中,这将导致您的函数在第一次迭代期间停止for循环。