使用R计算日历年和不同列的十进制

时间:2014-11-21 21:43:56

标签: r time-series dplyr quantile

我使用dplyr和函数tbl_df()创建了以下数据集:

date     X1    X2
1  2001-01-31 4.698648 4.640957
2  2001-02-28 4.491493 4.398382
3  2001-03-30 4.101235 4.074065
4  2001-04-30 4.072041 4.217999
5  2001-05-31 3.856718 4.114061
6  2001-06-29 3.909194 4.142691
7  2001-07-31 3.489640 3.678374
8  2001-08-31 3.327068 3.534823
9  2001-09-28 2.476066 2.727257
10 2001-10-31 2.015936 2.299102
11 2001-11-30 2.127617 2.590702
12 2001-12-31 2.162643 2.777744
13 2002-01-31 2.221636 2.740961
14 2002-02-28 2.276458 2.834494
15 2002-03-28 2.861650 3.472853
16 2002-04-30 2.402687 3.026207
17 2002-05-31 2.426250 2.968679
18 2002-06-28 2.045413 2.523772
19 2002-07-31 1.468695 1.677434
20 2002-08-30 1.707742 1.920101
21 2002-09-30 1.449055 1.554702
22 2002-10-31 1.350024 1.466806
23 2002-11-29 1.541507 1.844471
24 2002-12-31 1.208786 1.392031

我对计算每年和每列的十分位数感兴趣。例如,X1的2001年的十分位数,X2的2001年十分位数,2001年的X1的十分位数,2002年的十二分之一,如果我有更多的年份和更多的列,依此类推。我试过了:

quantile(x, prob = seq(0, 1, length = 11), type = 5)或使用带有apply.yearly()函数的quantile()和x的xts对象(我上面的数据框),但它们都没有执行我实际需要计算的内容。我们将非常感谢您的帮助。

2 个答案:

答案 0 :(得分:0)

假设您有一个简单的data.frame,首先按年份对日期进行分区:

df$year <- cut(as.Date(df$date), "year")

然后按年汇总:

foo <- aggregate(. ~ year, subset(df, select=-date), quantile,
                 prob = seq(0, 1, length = 11), type = 5)

这将返回一个数据框。但它需要一些清洁。使用免费版unnesttidyr中的lapply,您可以执行以下操作。请注意,X1的第一行是2001年,第二行是2002年。

devtools::install_github("hadley/tidyr")
library(tidyr)

unnest(lapply(foo[-1], as.data.frame), column)

#  column       0%      10%      20%      30%      40%      50%      60%      70%      80%      90%     100%
#1     X1 2.015936 2.094113 2.159140 2.561166 3.375840 3.673179 3.893451 4.055756 4.140261 4.553640 4.698648
#2     X1 1.208786 1.307653 1.439152 1.475976 1.591378 1.876578 2.168769 2.270976 2.405043 2.556870 2.861650
#3     X2 2.299102 2.503222 2.713601 2.853452 3.577888 3.876219 4.102062 4.139828 4.236037 4.471155 4.640957
#4     X2 1.392031 1.444374 1.545912 1.694138 1.867160 2.221936 2.675804 2.825141 2.974432 3.160201 3.472853

答案 1 :(得分:0)

您可以尝试以下功能:

df<- read.table(header=T,text='date     X1    X2
1  2001/01/31 4.698648 4.640957
2  2001/02/28 4.491493 4.398382
3  2001/03/30 4.101235 4.074065
4  2001/04/30 4.072041 4.217999
5  2001/05/31 3.856718 4.114061
6  2001/06/29 3.909194 4.142691
7  2001/07/31 3.489640 3.678374
8  2001/08/31 3.327068 3.534823
9  2001/09/28 2.476066 2.727257
10 2001/10/31 2.015936 2.299102
11 2001/11/30 2.127617 2.590702
12 2001/12/31 2.162643 2.777744
13 2002/01/31 2.221636 2.740961
14 2002/02/28 2.276458 2.834494
15 2002/03/28 2.861650 3.472853
16 2002/04/30 2.402687 3.026207
17 2002/05/31 2.426250 2.968679
18 2002/06/28 2.045413 2.523772
19 2002/07/31 1.468695 1.677434
20 2002/08/30 1.707742 1.920101
21 2002/09/30 1.449055 1.554702
22 2002/10/31 1.350024 1.466806
23 2002/11/29 1.541507 1.844471
24 2002/12/31 1.208786 1.392031')

find_quantile <- function(df,year,col,quant) { 
  year_df <- subset(df,year==substring(as.character(date),1,4))
  a <- quantile(year_df[,col] , quant)
  return(a)
}
#where df is the dataframe, 
#year is the year you want (as character), 
#col is the column you want to calculate the quantile (as index i.e. in your case 2 or 3, 
#quant is the quantile

例如:

> find_quantile(df,'2001',2,0.7) #specify the year as character
     70% 
4.023187