聚合函数功能处理数据框

时间:2020-08-26 12:23:52

标签: r aggregate

我想在代码中改进两点,但是我不知道该怎么做。可以说我下面有代码。

    #Creating artificially dates and variables
    date<-(seq(as.Date('2000-11-01'),as.Date('2020-06-01'),by = '1 month'))
    x<-rnorm(length(date))
    y<-runif(length(date))
    z<-rexp(length(date))
    df3<-data.frame(date,x,y,z)
#Create a function which creates data frame converted from monthly data to quarter data with respect to mean
    agg_quarter<-function(data){
#Converting first variable (second column, because first one is date-column)
      first_conv<-aggregate(data[,2]~quarter + year, transform(data, quarter = quarters(data[,1]), 
                                                               year = as.integer(format(data[,1], '%Y'))), mean)
      final<-first_conv[order(first_conv$year),]
#Adding next converts to my data frame
      for (i in 3:length(data)){
        next_conv<-aggregate(data[,i]~quarter + year, transform(data, quarter = quarters(data[,1]), year = as.integer(format(data[,1], '%Y'))), mean)
        final<-cbind(final,next_conv[order(next_conv$year), ][,3])
      }
    final
    }
    
    agg_quarter(df3)
       quarter year           x         y         z
    1       Q4 2000  0.45942451 0.7929780 0.7539960
    2       Q1 2001  1.23138264 0.4747510 0.5584441
    3       Q2 2001  0.44231712 0.7679884 0.6885684
    4       Q3 2001 -0.13386136 0.3885452 1.1084284
    5       Q4 2001  0.59844310 0.7335526 0.3695344
    6       Q1 2002  1.24900446 0.2888266 0.4352385
    7       Q2 2002  0.43844015 0.5924775 0.8822304
    8       Q3 2002  0.37774550 0.8112969 1.3918558
    9       Q4 2002  0.57288044 0.6549678 0.5810611
    10      Q1 2003  0.14598182 0.4013501 0.4013333
    11      Q2 2003 -0.62909627 0.5509173 1.2683533
    12      Q3 2003 -0.71456619 0.3974511 1.2887639
    13      Q4 2003  0.45491216 0.6072442 0.6193801
    14      Q1 2004  0.01672842 0.4396954 1.1243245
    15      Q2 2004 -0.39333422 0.7621499 1.2563600
    16      Q3 2004 -0.39156951 0.5565475 1.2788598
    17      Q4 2004  1.10912010 0.4413196 0.9220222
    18      Q1 2005 -0.40701050 0.7760670 0.8635499
    19      Q2 2005  0.30103801 0.5224330 1.0501535
    20      Q3 2005  0.45887917 0.2483629 0.6272839
    21      Q4 2005  0.02241649 0.5932321 1.2629598
    22      Q1 2006  0.41543193 0.1908029 0.5970694
    23      Q2 2006 -0.98873460 0.6352816 1.7712819
    24      Q3 2006 -0.10828167 0.1545828 1.6988625
    25      Q4 2006 -0.73658041 0.3856345 0.8629566
    26      Q1 2007  0.15195634 0.2793902 1.0515791
    27      Q2 2007 -0.30395748 0.6836821 1.2211982
    28      Q3 2007  0.79359011 0.4844654 1.0220515
    29      Q4 2007  0.16158802 0.6107544 0.6732919
    30      Q1 2008  0.52899548 0.4588903 0.8438358
    31      Q2 2008  1.92334521 0.4538614 1.1388537
    32      Q3 2008 -0.80191834 0.5997541 0.6089949
    33      Q4 2008 -0.57103869 0.7024573 0.2376695
    34      Q1 2009 -0.24723440 0.5005447 0.2627463
    35      Q2 2009 -0.30381117 0.4626645 1.3784519
    36      Q3 2009 -1.17451628 0.6690295 1.6415845
    37      Q4 2009 -0.87982746 0.8445355 0.3022675
    38      Q1 2010  0.07076255 0.3292379 0.9728483
    39      Q2 2010  0.07184322 0.5096927 1.0615695
    40      Q3 2010 -0.26468911 0.3171100 0.4730112
    41      Q4 2010 -0.69391437 0.4562580 1.8500731
    42      Q1 2011  0.21756054 0.6201900 0.4027133
    43      Q2 2011  0.17217771 0.4929368 0.5876891
    44      Q3 2011 -0.42497597 0.3867183 0.7277180
    45      Q4 2011 -0.21056633 0.5237307 0.6656154
    46      Q1 2012  0.69480678 0.5872376 0.6828554
    47      Q2 2012 -0.47632856 0.3717660 1.3673013
    48      Q3 2012  0.27589228 0.4867553 1.0885047
    49      Q4 2012  0.44380526 0.6387452 0.7829597
    50      Q1 2013 -0.05414971 0.5554888 0.1508641
    51      Q2 2013 -0.75089399 0.8138965 1.4438334
    52      Q3 2013  0.92820143 0.5835876 1.4205822
    53      Q4 2013  0.44616063 0.6300494 0.3126694
    54      Q1 2014 -0.41365654 0.2320446 1.0940730
    55      Q2 2014 -1.11164312 0.4898071 0.1683786
    56      Q3 2014  0.14569456 0.6576827 1.0650230
    57      Q4 2014  0.06138058 0.6081637 0.9863737
    58      Q1 2015  0.09543675 0.0961604 0.7548227
    59      Q2 2015 -0.47135801 0.8781596 0.7984845
    60      Q3 2015 -0.26276167 0.7056564 0.7241858
    61      Q4 2015  0.71868865 0.4662655 0.9601569
    62      Q1 2016  0.56489638 0.7037970 1.9201958
    63      Q2 2016  0.66280429 0.3777969 1.0581366
    64      Q3 2016 -0.70528460 0.7409868 0.8436434
    65      Q4 2016  0.31846701 0.2421001 1.8622658
    66      Q1 2017  0.56373102 0.3889981 1.5053578
    67      Q2 2017 -0.23500096 0.4646629 0.4574373
    68      Q3 2017  0.62223194 0.5856600 0.5756991
    69      Q4 2017  0.25810799 0.5065837 1.9082879
    70      Q1 2018 -0.09286488 0.3999487 0.9742502
    71      Q2 2018 -0.48238163 0.2719190 1.3198815
    72      Q3 2018 -0.29015459 0.7341444 0.5792056
    73      Q4 2018  0.07895146 0.4277270 3.2260907
    74      Q1 2019 -0.55417141 0.4413805 0.5651649
    75      Q2 2019  1.19594061 0.5188568 0.4132742
    76      Q3 2019 -0.35045666 0.4089393 1.8479158
    77      Q4 2019  1.17132684 0.3002464 0.5272597
    78      Q1 2020 -0.69268958 0.4336187 0.5031869
    79      Q2 2020 -0.05743833 0.4061218 0.7290067

关于上述代码的改进,我有两个问题。

(1)是否有可能将我的季度和年份列连接成一个?现在,如您在输出中看到的那样,这些是单独的列,我认为这是不必要的。

(2)有没有其他方法可以做到?我的意思是我在上面的代码中所做的是转换第二列,然后添加下一个转换后的变量。我正在考虑可以一次转换整个数据帧的方法。这样做可行吗?

1 个答案:

答案 0 :(得分:2)

使用末尾的注释和yearqtr类(从Zoo包中)可重复显示的数据,可以将其简化为一行代码:

library(zoo)

aggregate(df3[-1], list(yq = as.yearqtr(df3[[1]])), mean)

给予:

        yq            x         y         z
1  2000 Q4 -0.395326568 0.5916578 1.4637230
2  2001 Q1  0.586168147 0.5780197 0.4912218
3  2001 Q2  0.303639986 0.3075270 0.3069566
4  2001 Q3  0.030522325 0.4456214 1.9126493
5  2001 Q4  0.290422665 0.5787345 1.1096687
6  2002 Q1  0.576307493 0.4116260 1.6420619
...etc...

也可以使用aggregate.zoo这样来完成。它创建了一个动物园系列z,使用aggregate.zoo进行了整合,然后转换回data.frame。如果动物园对象结果正常,则可以省略最后一行。

z <- read.zoo(df3)
ag <- aggregate(z, as.yearqtr, mean)
fortify.zoo(ag, names = "yq")

注意

要使数据可重复,我们必须使用set.seed

set.seed(123)

date <- seq(as.Date('2000-11-01'), as.Date('2020-06-01'), 
  by = '1 month')
n <- length(date)
df3 <- data.frame(date, x = rnorm(n), y = runif(n), z = rexp(n))