我想制作一个索引列,其中包含从一年的十月份到明年九月的 期间。 Here is a large sample data to emphasise the point。请注意数据的面板设置。假设,我试图计算此窗口中每个股票的平均值,例如20011年10月到2012年9月。一旦我有索引列,我将执行以下操作:
meanDF = aggregate(cbind(A) ~ Index + Firm, df, FUN = mean)
除了平均计算之外,我将执行许多自定义操作,因此我可以轻松地在上面的代码中替换我的自定义函数。请帮忙。非常感谢你。
答案 0 :(得分:1)
我用你的数据制作了一个索引列('yymm'),因此它显示为四位数字格式,例如2011年10月1110。
dat <- read.csv("./input/p_df.csv")
dat$Date <- as.character(dat$Date)
dat$Date<-as.Date(dat$Date, format="%m/%d/%Y")
dat$yymm <- format(dat$Date, format="%y%m")
创建一个矩阵,其中包含每个10月至9月期间的开始日期和结束日期:
dd <- structure(c(1110, 1209, 1210, 1309, 1310, 1409, 1410, 1509), .Dim = c(2L, 4L))
[,1] [,2] [,3] [,4]
[1,] 1110 1210 1310 1410
[2,] 1209 1309 1409 1509
将数据子集化为4个独立的data.frame对应于矩阵的起始末期:
df2<-lapply(1:4, function(x)dat %>% filter(mmyy >= dd[1,x] & mmyy <= dd[2,x]))
按公司对每个数据集进行分组,并总结股票的平均值(A到F):
plyr::llply(df2, function(x) x %>% group_by(Firm) %>% select(A:F) %>% summarise_each(funs(mean)))
[[1]]
Source: local data frame [5 x 7]
Firm A B C D E F
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 BOB IS Equity 145.9267 3316808 62.52732 84.29513 1957.7310 285642.5
2 GAIL IS Equity 370.0094 1106420 49.80055 82.06510 1268.4775 469232.8
3 ITC IS Equity 227.2641 6970928 48.01366 67.84061 7809.3682 1778660.0
4 MM IS Equity 720.6503 1704623 53.01366 36.21561 613.9769 443013.4
5 RIL IS Equity 771.9296 3915459 47.72951 22.04312 3274.5789 2528920.7
[[2]]
Source: local data frame [5 x 7]
Firm A B C D E F
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 BOB IS Equity 137.7357 5329819 64.82192 81.98227 2055.4590 281634.8
2 GAIL IS Equity 333.9021 1148524 53.84932 82.13927 1268.4770 423761.6
3 ITC IS Equity 311.1275 7100443 46.88767 74.57744 7890.6657 2456360.8
4 MM IS Equity 898.4038 1329277 55.72329 46.41512 614.4784 552200.7
5 RIL IS Equity 833.1956 3224021 50.81096 49.91264 3245.9668 2703932.9
[[3]]
Source: local data frame [5 x 7]
Firm A B C D E F
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 BOB IS Equity 146.6735 8628298 58.94795 81.65596 2133.6639 314165.4
2 GAIL IS Equity 383.4397 1279186 46.99178 82.22435 1268.4770 487096.0
3 ITC IS Equity 337.2251 6373170 49.96164 76.48013 7946.3991 2681621.5
4 MM IS Equity 1062.1181 1057952 53.12877 53.80728 616.1057 656305.1
5 RIL IS Equity 934.2914 3138729 47.23288 60.38028 3232.1816 3023599.4
[[4]]
Source: local data frame [5 x 7]
Firm A B C D E F
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 BOB IS Equity 181.0604 6415760 54.68493 85.77090 2176.903 394006.5
2 GAIL IS Equity 398.5686 1480755 40.84932 83.58569 1268.477 504064.4
3 ITC IS Equity 341.9144 7534123 44.30411 78.84935 8005.011 2736656.1
4 MM IS Equity 1250.7123 1084946 46.51781 62.64578 621.092 777771.6
5 RIL IS Equity 914.7201 3571817 42.55068 59.33441 3236.035 2960117.6
为每个时段创建一个索引:
for(i in 1:nrow(dat)){
dat[i,"Index"]<- ifelse(dat[i,"mmyy"] >= dd[1,1] & dat[i,"mmyy"] <= dd[2,1], 1,
ifelse(dat[i,"mmyy"] >= dd[1,2] & dat[i,"mmyy"] <= dd[2,2], 2,
ifelse(dat[i,"mmyy"] >= dd[1,3] & dat[i,"mmyy"] <= dd[2,3], 3, 4)))
}