如何用R将年度基于时间的数据分成36个部分?

时间:2015-09-07 19:46:07

标签: r

我有一个像以下这样的df,直到2015年才30年。我想每个月将数据分成1-10,11-20和21-31三个数据,平均每十个数据(少于10个)。因此,每个月有三个数据。我该怎么办?

1993-01-29 28.92189
1993-02-01 29.12760
1993-02-02 29.18927
1993-02-03 29.49786
1993-02-04 29.62128
1993-02-05 29.60068
1993-02-08 29.60068
1993-02-09 29.39498
------
------
2015-08-18 209.92999
2015-08-19 208.28000
2015-08-20 204.01000
2015-08-21 197.63001
2015-08-24 189.55000
2015-08-25 187.23000
2015-08-26 194.67999
2015-08-27 199.16000
2015-08-28 199.24000

3 个答案:

答案 0 :(得分:1)

tryCatch用于消除数据开始日期问题。我有空的时候会提供更多信息。

library(xts)
dates<-seq(as.Date("1993-01-29"),as.Date("2015-08-25"),"days")
sample<-rnorm(length(dates))


tmpxts<-split.xts(xts(x = sample,order.by = dates),f = "months")

mxts<-lapply(tmpxts,function(x) {
  tmp<-data.frame(val=tryCatch(c(mean(x[1:10]),mean(x[11:20]),mean(x[21:length(x)])),
            error=function(e) matrix(mean(x),1)))
  row.names(tmp)<-tryCatch(index(x[c(1,11,21)]),error=function(e) index(x[1]))
  tmp
  })

do.call(rbind,mxts)

答案 1 :(得分:0)

以下代码根据每个月的天数将每个月分别划分为三分之一。

library(dplyr)
library(lubridate)
library(ggplot2)

# Fake data
df = data.frame(date=seq.Date(as.Date("2013-01-01"), 
                              as.Date("2013-03-31"), by="day"))

set.seed(394)
df$value = rnorm(nrow(df), sqrt(1:nrow(df)), 2)

# Cut months into thirds
df = df %>% 
  # Create a new column to group by Year-Month
  mutate(yr_mon = paste0(year(date) , "_", month(date, label=TRUE, abbr=TRUE))) %>%
  group_by(yr_mon) %>%
  # Cut each month into thirds
  mutate(cutMonth = cut(day(date), 
                        breaks=c(0, round(1/3*n()), round(2/3*n()), n()),
                        labels=c("1st third","2nd third","3rd third")),
  # Add yr_mon to cutMonth so that we have a unique group label for 
  # each third of each month
         cutMonth = paste0(yr_mon, "\n", cutMonth)) %>%
  ungroup() %>%
  # Turn cutMonth into a factor with correct date ordering
  mutate(cutMonth = factor(cutMonth, levels=unique(cutMonth))) 

结果如下:

# Show number of observations in each group
as.data.frame(table(df$cutMonth))

                 Var1 Freq
1 2013_Jan\n1st third   10
2 2013_Jan\n2nd third   11
3 2013_Jan\n3rd third   10
4 2013_Feb\n1st third    9
5 2013_Feb\n2nd third   10
6 2013_Feb\n3rd third    9
7 2013_Mar\n1st third   10
8 2013_Mar\n2nd third   11
9 2013_Mar\n3rd third   10

# Plot means by group (just to visualize the result of the date grouping operations)
ggplot(df, aes(cutMonth, value)) +
  stat_summary(fun.y=mean, geom='point', size=4, colour="red") +
  coord_cartesian(ylim=c(-0.2,10.2)) +
  theme(axis.text.x = element_text(size=14))

enter image description here

答案 2 :(得分:0)

这是一个基础解决方案,可以根据不断增加的顺序建立削减周期,数月,月份以及本月1日,11日和21日的削减。基本削减功能的默认值包括将休息时间视为“正确” “间隔,但你的规格要求削减1,11和21(在较低的间隔中留下10和20)所以我用右= TRUE:

 tapply(dat$V2, cut.Date(dat$V1, 
                         breaks=as.Date( 
                                 apply( expand.grid( c(1,11,21), 1:12,  1993:2015), 1, 
                                  function( x) paste(rev(x), collapse="-")) ), right=TRUE), FUN=mean)


1993-01-01 1993-01-11 1993-01-21 1993-02-01 1993-02-11 1993-02-21 1993-03-01 
        NA         NA   29.02475   29.48412         NA         NA         NA 
snipped many empty intervals

结果的底部包括:

2015-07-21 2015-08-01 2015-08-11 2015-08-21 2015-09-01 2015-09-11 2015-09-21 
        NA         NA  204.96250  193.97200         NA         NA         NA 
2015-10-01 2015-10-11 2015-10-21 2015-11-01 2015-11-11 2015-11-21 2015-12-01 
        NA         NA         NA         NA         NA         NA         NA 
2015-12-11 
        NA