我知道这里有关于如何将季节应用于数据框的答案,我的问题增加了添加年度和半年的复杂性,当开始日期不在典型的季节范围内时
dates <- data.frame(StartDate=as.Date(c("01/01/2013","04/01/2013","10/01/2013","06/01/2013"
,"09/01/2013","05/01/2013"), format = "%m/%d/%Y"),EndDate=as.Date(c("12/01/2013","12/21/2013
","05/25/2014","08/15/2013","11/30/2013","10/01/2013"),format = "%m/%d/%Y"))
StartDate EndDate
1 2013-01-01 2013-12-01
2 2013-04-01 2013-12-21
3 2013-10-01 2014-05-25
4 2013-06-01 2013-08-15
5 2013-09-01 2013-11-30
6 2013-05-01 2013-10-01
我需要编写一个添加“季节”列的函数,以便我输出所以我会在现有数据框中添加一个名为season的列
StartDate EndDate Season
1 2013-01-01 2013-12-01 Yearly
2 2013-04-01 2013-12-21 Yearly
3 2013-10-01 2014-05-25 Half-year
4 2013-06-01 2013-08-15 Summer
5 2013-09-01 2013-11-30 Fall
6 2013-05-01 2013-10-01 Half-year
此外,由于日期下降到处都是,我认为如果我删除日期和年份并仅根据月份应用函数然后我将它们转换为数字,可能会简化操作。
dates$StartDate <- format(dates$StartDate, "%m")
dates$EndDate <- format(dates$EndDate, "%m")
dates$StartDate <- as.numeric(dates$StartDate)
dates$StartDate <- as.numeric(dates$StartDate)
StartDate EndDate
1 12
4 12
10 5
6 8
9 11
5 10
这是我试图写的功能。我的参数是:如果开始日期等于结束日期那么是年度,如果结束日期 - 开始日期+1 = 12则那是年度,如果结束日期 - 开始日期+1是8-11之间的年度,如果结束日期 - 开始日期如果结束日期 - 开始日期= 6,即半年,然后是基于3个月间隔的季节,则+1大于或等于5但小于8,那么是半年。
如果有更简单的方式,我愿意接受建议。
Seasons <- function(dates)
{
dates$Season <- NULL
for(i in 1:dim(dates)[1])
{
if(dates$StartDate[i] == dates$EndDate[i]){
dates$Season[i] <- "Yearly"
}
if(dates$EndDate[i] - dates$StartDate[i] + 1 == 12){
dates$Season[i] <- "Yearly"
}
if(dates$EndDate[i] - dates$StartDate[i] == 6){
dates$Season[i] <- "Half Year"
}
if(dates$EndDate[i] - dates$StartDate[i] + 1 >= 5 < 8){
dates$Season[i] <- "Half Year"
}
if(dates$EndDate[i] - dates$StartDate[i] + 1 >= 8 < 12){
dates$Season[i] <- "Yearly"
}
if(dates$StartDate[i] == 12 & dates$EndDate[i] == 2){
dates$Season[i] <- "Winter"
}
if(dates$StartDate[i] == 3 & dates$EndDate[i] == 5){
dates$Season[i] <- "Spring"
}
if(dates$StartDate[i] == 6 & dates$EndDate[i] == 8){
dates$Season[i] <- "Summer"
}
if(dates$StartDate[i] == 9 & dates$EndDate[i] == 11){
dates$Season[i] <- "Fall"
}
return(dates)
}
}
当我运行该功能时,它将“夏天”应用于所有日期。此外,有些行是空白的或没有我想忽略的结束日期。
我也遇到了很多错误,以下是主要错误:
Error: unexpected '<' in:
Error in dates$Season[i] <- "Half Year" : object 'i' not found
Error: unexpected '}' in " }"
答案 0 :(得分:0)
您构建的功能应该是这样的:
library(data.table)
Seasons <- function(dates)
{
seasons <- rep(NA, nrow(dates))
for(i in 1:nrow(dates))
{
if(is.na(month(dates$StartDate[i])) | is.na(month(dates$EndDate[i]))) next
if(month(dates$StartDate)[i] == month(dates$EndDate)[i]){
seasons[i] <- "Yearly"
}
if(abs(month(dates$EndDate)[i] - month(dates$StartDate)[i]) + 1 == 12){
seasons[i] <- "Yearly"
}
if(abs(month(dates$EndDate)[i] - month(dates$StartDate)[i]) == 6){
seasons[i] <- "Half Year"
}
if(abs(month(dates$EndDate)[i] - month(dates$StartDate)[i] + 1) >= 5 &
abs(month(dates$EndDate)[i] - month(dates$StartDate)[i] + 1) < 8){
seasons[i] <- "Half Year"
}
if(abs(month(dates$EndDate)[i] - month(dates$StartDate)[i]) + 1 >= 8 &
abs(month(dates$EndDate)[i] - month(dates$StartDate)[i]) + 1 < 12){
seasons[i] <- "Yearly"
}
if(month(dates$StartDate)[i] == 12 & month(dates$EndDate)[i] == 2){
seasons[i] <- "Winter"
}
if(month(dates$StartDate)[i] == 3 & month(dates$EndDate)[i] == 5){
seasons[i] <- "Spring"
}
if(month(dates$StartDate)[i] == 6 & month(dates$EndDate)[i] == 8){
seasons[i] <- "Summer"
}
if(month(dates$StartDate)[i] == 9 & month(dates$EndDate)[i] == 11){
seasons[i] <- "Fall"
}
}
return(seasons)
}
然后当你运行它时,它会产生:
> dates$seasons <- Seasons(dates)
> dates
StartDate EndDate seasons
1 2013-01-01 2013-12-01 Yearly
2 2013-04-01 2013-12-21 Yearly
3 2013-10-01 2014-05-25 <NA>
4 2013-06-01 2013-08-15 Summer
5 2013-09-01 2013-11-30 Fall
6 2013-05-01 2013-10-01 Half Year
关于你的功能的一些评论:
if(dates$EndDate[i] - dates$StartDate[i] + 1 >= 5 < 8
是语法错误。如果您要同时检查>
和<
,则需要输入两次表达式。5 - 10 + 1 == -4
,这显然不是您想要的。month
来自data.table并返回日期的月份数。return(dates)
位于for-loop
中,这将导致您的函数在第一次迭代期间停止for循环。