我希望根据水文年的定义将我的数据集分成年份。根据维基百科,"由于气象和地理因素,水年的定义各不相同。在美国,水文年是从一年的10月1日到下一年的9月30日之间的时期。 我使用波兰的水文年定义(从11月1日开始到10月31日结束)。
示例数据集看起来如下:
sampleData <- structure(list(date = structure(c(15946, 15947, 15875, 15910, 15869, 15888, 15823, 16059, 16068, 16067), class = "Date"),`example value` = c(-0.325806595888448, 0.116001346459147, 1.68884381116696, -0.480527505762716, -0.50307381813168,-1.12032214801472, -0.659699514672226, -0.547101497279717, 0.729148872679021,-0.769760735764215)), .Names = c("date", "example value"), row.names = c(NA, -10L), class = "data.frame")
出于某种原因,功能&#34; cut&#34;在我的代码中抱怨&#34;打破&#34;和&#34;标签&#34;长度不同(但他们不是)。如果我省略&#34;标签&#34;切割中的选项(如下所示)功能完美。 标签有什么问题?
ToHydroQuarters <-function(df)
{
result <- df
yearStart <- as.numeric(format(min(df$date),'%Y'))-1
#Hydrological year in Poland starts at November 1st
DateStart <- as.Date(paste(yearStart,"-11-01",sep=""))
breaks <- seq(from=DateStart, to=max(df$date)+90, by="quarter")
breakYear <- format(breaks,'%Y')
#Please, do not create labels in such way.
#Please note that for November and December we have next hydrological year - since it started at 1st November. So, we need to check month to decide which year we have (?) or use cut function again as mentioned here: http://stackoverflow.com/questions/22073881/hydrological-year-time-series
labels <- c(paste("Winter",breakYear[1]),
paste("Spring",breakYear[2]),
paste("Summer",breakYear[3]),
paste("Autumn",breakYear[4]),
paste("Autumn",breakYear[5]))
######Here is problem - once I add labels parameter, function complains about different lengths
result$hydroYear <- cut(df$date, breaks)
result
}
答案 0 :(得分:2)
首先,我认为将标签作为函数中的“硬编码”变量是不明智的,因为没有某种可重复的示例就无法检查,但是我可以看到你想要实现的目标。
你声称你的中断和标签应该是正确的长度,但是函数本身并不总是有效(这没有标签,即使标签确实存在,cut
函数也没有处理最后一个部分日期)。
例如:
library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="day")))
a <- ToHydroQuarters(df)
tail(a)
返回:
date hydroYear
971 2011-08-29 <NA>
972 2011-08-30 <NA>
973 2011-08-31 <NA>
974 2011-09-01 <NA>
975 2011-09-02 <NA>
976 2011-09-03 <NA>
执行breaks <- seq(from=DateStart, to=max(df$date)+90, by="quarter")
之类的操作确实解决了这个问题,因为它会强制实际存在中断。这可能会解决您在函数中遇到的标签问题,但它不会使函数“通用”。
就编码方面而言,我认为将月份和年份分开转换会更好,因为它更容易理解。例如,您可以使用library(lubridate)
轻松提取月份,并像往常一样指定中断和标签。我在想这个函数看起来像这样:
thq <- function(date) {
mnth <- cut(month(date), breaks=c(1,4,7, 10, 12),
right=FALSE, include.lowest=TRUE,
labels=c("Spring", "Summer", "Autumn", "Winter"))
return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}
然后使用一些虚拟数据......
library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="month")))
thq <- function(date) {
mnth <- cut(month(date), breaks=c(1,4,7, 10, 12),
right=FALSE, include.lowest=TRUE,
labels=c("Spring", "Summer", "Autumn", "Winter"))
return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}
df$newdate <- thq(df$date)
具有以下输出:
date newdate
1 2009-01-01 Spring 2009
2 2009-02-01 Spring 2009
3 2009-03-01 Spring 2009
4 2009-04-01 Summer 2009
5 2009-05-01 Summer 2009
6 2009-06-01 Summer 2009
7 2009-07-01 Autumn 2009
8 2009-08-01 Autumn 2009
9 2009-09-01 Autumn 2009
10 2009-10-01 Winter 2010
11 2009-11-01 Winter 2010
12 2009-12-01 Winter 2010
13 2010-01-01 Spring 2010
14 2010-02-01 Spring 2010
15 2010-03-01 Spring 2010
16 2010-04-01 Summer 2010
17 2010-05-01 Summer 2010
18 2010-06-01 Summer 2010
19 2010-07-01 Autumn 2010
20 2010-08-01 Autumn 2010
21 2010-09-01 Autumn 2010
22 2010-10-01 Winter 2011
23 2010-11-01 Winter 2011
24 2010-12-01 Winter 2011
25 2011-01-01 Spring 2011
26 2011-02-01 Spring 2011
27 2011-03-01 Spring 2011
28 2011-04-01 Summer 2011
29 2011-05-01 Summer 2011
30 2011-06-01 Summer 2011
31 2011-07-01 Autumn 2011
32 2011-08-01 Autumn 2011
33 2011-09-01 Autumn 2011
如果它是奇怪的顺序,你可以使用模运算符来移动月份......
thq <- function(date) {
mnth <- cut(((month(df$date)+1) %% 12), breaks=c(0, 3, 6, 9, 12),
right=FALSE, include.lowest=TRUE,
labels=c("Nov_Jan", "Feb_Apr", "May_Jul", "Aug_Oct")
)
# you will need to alter the return statement yourself, because
# I feel there is enough information for you to do it, rather than
# me changing it every time you change the question.
return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}
library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="day")))
df$new <- thq(df$date)
head(df)
输出:
> head(df)
date new
1 2009-01-01 Nov_Jan 2009
2 2009-01-02 Nov_Jan 2009
3 2009-01-03 Nov_Jan 2009
4 2009-01-04 Nov_Jan 2009
5 2009-01-05 Nov_Jan 2009
6 2009-01-06 Nov_Jan 2009