我使用dplyr和函数tbl_df()创建了以下数据集:
date X1 X2
1 2001-01-31 4.698648 4.640957
2 2001-02-28 4.491493 4.398382
3 2001-03-30 4.101235 4.074065
4 2001-04-30 4.072041 4.217999
5 2001-05-31 3.856718 4.114061
6 2001-06-29 3.909194 4.142691
7 2001-07-31 3.489640 3.678374
8 2001-08-31 3.327068 3.534823
9 2001-09-28 2.476066 2.727257
10 2001-10-31 2.015936 2.299102
11 2001-11-30 2.127617 2.590702
12 2001-12-31 2.162643 2.777744
13 2002-01-31 2.221636 2.740961
14 2002-02-28 2.276458 2.834494
15 2002-03-28 2.861650 3.472853
16 2002-04-30 2.402687 3.026207
17 2002-05-31 2.426250 2.968679
18 2002-06-28 2.045413 2.523772
19 2002-07-31 1.468695 1.677434
20 2002-08-30 1.707742 1.920101
21 2002-09-30 1.449055 1.554702
22 2002-10-31 1.350024 1.466806
23 2002-11-29 1.541507 1.844471
24 2002-12-31 1.208786 1.392031
我对计算每年和每列的十分位数感兴趣。例如,X1的2001年的十分位数,X2的2001年十分位数,2001年的X1的十分位数,2002年的十二分之一,如果我有更多的年份和更多的列,依此类推。我试过了:
quantile(x, prob = seq(0, 1, length = 11), type = 5)
或使用带有apply.yearly()
函数的quantile()
和x的xts对象(我上面的数据框),但它们都没有执行我实际需要计算的内容。我们将非常感谢您的帮助。
答案 0 :(得分:0)
假设您有一个简单的data.frame,首先按年份对日期进行分区:
df$year <- cut(as.Date(df$date), "year")
然后按年汇总:
foo <- aggregate(. ~ year, subset(df, select=-date), quantile,
prob = seq(0, 1, length = 11), type = 5)
这将返回一个数据框。但它需要一些清洁。使用免费版unnest
和tidyr
中的lapply
,您可以执行以下操作。请注意,X1的第一行是2001年,第二行是2002年。
devtools::install_github("hadley/tidyr")
library(tidyr)
unnest(lapply(foo[-1], as.data.frame), column)
# column 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
#1 X1 2.015936 2.094113 2.159140 2.561166 3.375840 3.673179 3.893451 4.055756 4.140261 4.553640 4.698648
#2 X1 1.208786 1.307653 1.439152 1.475976 1.591378 1.876578 2.168769 2.270976 2.405043 2.556870 2.861650
#3 X2 2.299102 2.503222 2.713601 2.853452 3.577888 3.876219 4.102062 4.139828 4.236037 4.471155 4.640957
#4 X2 1.392031 1.444374 1.545912 1.694138 1.867160 2.221936 2.675804 2.825141 2.974432 3.160201 3.472853
答案 1 :(得分:0)
您可以尝试以下功能:
df<- read.table(header=T,text='date X1 X2
1 2001/01/31 4.698648 4.640957
2 2001/02/28 4.491493 4.398382
3 2001/03/30 4.101235 4.074065
4 2001/04/30 4.072041 4.217999
5 2001/05/31 3.856718 4.114061
6 2001/06/29 3.909194 4.142691
7 2001/07/31 3.489640 3.678374
8 2001/08/31 3.327068 3.534823
9 2001/09/28 2.476066 2.727257
10 2001/10/31 2.015936 2.299102
11 2001/11/30 2.127617 2.590702
12 2001/12/31 2.162643 2.777744
13 2002/01/31 2.221636 2.740961
14 2002/02/28 2.276458 2.834494
15 2002/03/28 2.861650 3.472853
16 2002/04/30 2.402687 3.026207
17 2002/05/31 2.426250 2.968679
18 2002/06/28 2.045413 2.523772
19 2002/07/31 1.468695 1.677434
20 2002/08/30 1.707742 1.920101
21 2002/09/30 1.449055 1.554702
22 2002/10/31 1.350024 1.466806
23 2002/11/29 1.541507 1.844471
24 2002/12/31 1.208786 1.392031')
find_quantile <- function(df,year,col,quant) {
year_df <- subset(df,year==substring(as.character(date),1,4))
a <- quantile(year_df[,col] , quant)
return(a)
}
#where df is the dataframe,
#year is the year you want (as character),
#col is the column you want to calculate the quantile (as index i.e. in your case 2 or 3,
#quant is the quantile
例如:
> find_quantile(df,'2001',2,0.7) #specify the year as character
70%
4.023187