R - 将数据分成水文季度

时间:2015-08-30 04:52:47

标签: r

我希望根据水文年的定义将我的数据集分成年份。根据维基百科,"由于气象和地理因素,水年的定义各不相同。在美国,水文年是从一年的10月1日到下一年的9月30日之间的时期。 我使用波兰的水文年定义(从11月1日开始到10月31日结束)。

示例数据集看起来如下:

sampleData <- structure(list(date = structure(c(15946, 15947, 15875, 15910, 15869, 15888, 15823, 16059, 16068, 16067), class = "Date"),`example value` = c(-0.325806595888448, 0.116001346459147, 1.68884381116696, -0.480527505762716, -0.50307381813168,-1.12032214801472, -0.659699514672226, -0.547101497279717, 0.729148872679021,-0.769760735764215)), .Names = c("date", "example value"), row.names = c(NA, -10L), class = "data.frame")

出于某种原因,功能&#34; cut&#34;在我的代码中抱怨&#34;打破&#34;和&#34;标签&#34;长度不同(但他们不是)。如果我省略&#34;标签&#34;切割中的选项(如下所示)功能完美。 标签有什么问题?

ToHydroQuarters <-function(df)
{
  result <- df
  yearStart <- as.numeric(format(min(df$date),'%Y'))-1
  #Hydrological year in Poland starts at November 1st
  DateStart <- as.Date(paste(yearStart,"-11-01",sep=""))

  breaks <- seq(from=DateStart, to=max(df$date)+90, by="quarter")
  breakYear <- format(breaks,'%Y')

  #Please, do not create labels in such way.
  #Please note that for November and December we have next hydrological year - since it started at 1st November. So, we need to check month to decide which year we have (?) or use cut function again as mentioned here: http://stackoverflow.com/questions/22073881/hydrological-year-time-series
  labels <- c(paste("Winter",breakYear[1]),
           paste("Spring",breakYear[2]),
           paste("Summer",breakYear[3]),
           paste("Autumn",breakYear[4]),
           paste("Autumn",breakYear[5]))

  ######Here is problem - once I add labels parameter, function complains about different lengths
  result$hydroYear <- cut(df$date, breaks)

  result
}

1 个答案:

答案 0 :(得分:2)

首先,我认为将标签作为函数中的“硬编码”变量是不明智的,因为没有某种可重复的示例就无法检查,但是我可以看到你想要实现的目标。

你声称你的中断和标签应该是正确的长度,但是函数本身并不总是有效(这没有标签,即使标签确实存在,cut函数也没有处理最后一个部分日期)。

例如:

library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="day")))
a <- ToHydroQuarters(df)

tail(a)

返回:

          date hydroYear
971 2011-08-29      <NA>
972 2011-08-30      <NA>
973 2011-08-31      <NA>
974 2011-09-01      <NA>
975 2011-09-02      <NA>
976 2011-09-03      <NA>

执行breaks <- seq(from=DateStart, to=max(df$date)+90, by="quarter")之类的操作确实解决了这个问题,因为它会强制实际存在中断。这可能会解决您在函数中遇到的标签问题,但它不会使函数“通用”。

就编码方面而言,我认为将月份和年份分开转换会更好,因为它更容易理解。例如,您可以使用library(lubridate)轻松提取月份,并像往常一样指定中断和标签。我在想这个函数看起来像这样:

thq <- function(date) {
  mnth <- cut(month(date), breaks=c(1,4,7, 10, 12), 
              right=FALSE, include.lowest=TRUE, 
              labels=c("Spring", "Summer", "Autumn", "Winter"))
  return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}

然后使用一些虚拟数据......

library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="month")))

thq <- function(date) {
  mnth <- cut(month(date), breaks=c(1,4,7, 10, 12), 
              right=FALSE, include.lowest=TRUE, 
              labels=c("Spring", "Summer", "Autumn", "Winter"))
  return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}

df$newdate <- thq(df$date)

具有以下输出:

         date     newdate
1  2009-01-01 Spring 2009
2  2009-02-01 Spring 2009
3  2009-03-01 Spring 2009
4  2009-04-01 Summer 2009
5  2009-05-01 Summer 2009
6  2009-06-01 Summer 2009
7  2009-07-01 Autumn 2009
8  2009-08-01 Autumn 2009
9  2009-09-01 Autumn 2009
10 2009-10-01 Winter 2010
11 2009-11-01 Winter 2010
12 2009-12-01 Winter 2010
13 2010-01-01 Spring 2010
14 2010-02-01 Spring 2010
15 2010-03-01 Spring 2010
16 2010-04-01 Summer 2010
17 2010-05-01 Summer 2010
18 2010-06-01 Summer 2010
19 2010-07-01 Autumn 2010
20 2010-08-01 Autumn 2010
21 2010-09-01 Autumn 2010
22 2010-10-01 Winter 2011
23 2010-11-01 Winter 2011
24 2010-12-01 Winter 2011
25 2011-01-01 Spring 2011
26 2011-02-01 Spring 2011
27 2011-03-01 Spring 2011
28 2011-04-01 Summer 2011
29 2011-05-01 Summer 2011
30 2011-06-01 Summer 2011
31 2011-07-01 Autumn 2011
32 2011-08-01 Autumn 2011
33 2011-09-01 Autumn 2011

如果它是奇怪的顺序,你可以使用模运算符来移动月份......

thq <- function(date) {
mnth <- cut(((month(df$date)+1) %% 12), breaks=c(0, 3, 6, 9, 12), 
            right=FALSE, include.lowest=TRUE, 
            labels=c("Nov_Jan", "Feb_Apr", "May_Jul", "Aug_Oct")
            )
# you will need to alter the return statement yourself, because
# I feel there is enough information for you to do it, rather than
# me changing it every time you change the question.
return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}

library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="day")))

df$new <- thq(df$date)

head(df)

输出:

> head(df)
        date          new
1 2009-01-01 Nov_Jan 2009
2 2009-01-02 Nov_Jan 2009
3 2009-01-03 Nov_Jan 2009
4 2009-01-04 Nov_Jan 2009
5 2009-01-05 Nov_Jan 2009
6 2009-01-06 Nov_Jan 2009