我有来自Compustat的季度基本数据,如下所示:
fund<-data.frame(quarterlydate=as.Date(c("03/31/1966","06/30/1966"), '%m/%d/%Y'),
gvkey=c(1000,1000,1001,1001), tic=c("XTL", "XTL", "ABL","ABL"),
sales=c(70,75,20,22))
> fund
quarterlydate gvkey tic sales
1 1966-03-31 1000 XTL 70
2 1966-06-30 1000 XTL 75
3 1966-03-31 1001 ABL 20
4 1966-06-30 1001 ABL 22
我还有来自CRSP的每日价格数据,如下所示:
prices<-data.frame(dailydate=seq(as.Date("1966/01/01"), as.Date("1966/06/30"), "days"), gvkey=c(rep(1000, 181),rep(1001, 181)),
tic=c(rep("XTL",181), rep("ABL",181)),
price=floor(runif(length(seq(as.Date("1966/01/01"), as.Date("1966/06/30"), "days")), min=0, max=50)))
> head(prices)
dailydate gvkey tic price
1 1966-01-01 1000 XTL 44
2 1966-01-02 1000 XTL 42
3 1966-01-03 1000 XTL 42
4 1966-01-04 1000 XTL 16
5 1966-01-05 1000 XTL 27
6 1966-01-06 1000 XTL 36
> tail(prices)
dailydate gvkey tic price
357 1966-06-25 1001 ABL 0
358 1966-06-26 1001 ABL 28
359 1966-06-27 1001 ABL 4
360 1966-06-28 1001 ABL 18
361 1966-06-29 1001 ABL 49
362 1966-06-30 1001 ABL 4
问题:
1)如何合并这样的季度和日常数据集,以获得类似下面的数据框?
2)我如何计算季度平均价格并将价值分配给季度? (下表中的“average_quarterly_price”变量)
我想要一个像这样的合并数据框:
dailydate quarterlydates gvkey tic price sales average_quarterly_price
1 1966-01-01 1966-03-31 1000 XTL 1 70 32
2 1966-01-02 1966-03-31 1000 XTL 10 70 32
3 1966-01-03 1966-03-31 1000 XTL 14 70 32
4 1966-01-04 1966-03-31 1000 XTL 29 70 32
5 1966-01-05 1966-03-31 1000 XTL 1 70 32
6 1966-01-06 1966-03-31 1000 XTL 43 70 32
.
.
.
182 1966-04-01 1966-06-31 1000 XTL 11 75 41
183 1966-04-02 1966-06-31 1000 XTL 8 75 41
184 1966-04-03 1966-06-31 1000 XTL 16 75 41
185 1966-04-04 1966-06-31 1000 XTL 14 75 41
186 1966-04-05 1966-06-31 1000 XTL 14 75 41
187 1966-04-06 1966-06-31 1000 XTL 20 75 41
.
.
.
364 1966-01-01 1966-03-31 1001 ABL 18 20 15
365 1966-01-02 1966-03-31 1001 ABL 10 20 15
366 1966-01-03 1966-03-31 1001 ABL 13 20 15
367 1966-01-04 1966-03-31 1001 ABL 13 20 15
368 1966-01-05 1966-03-31 1001 ABL 11 20 15
369 1966-01-06 1966-03-31 1001 ABL 13 20 15
.
.
.
545 1966-04-01 1966-06-31 1001 ABL 14 22 16
555 1966-04-02 1966-06-31 1001 ABL 21 22 16
556 1966-04-03 1966-06-31 1001 ABL 18 22 16
557 1966-04-04 1966-06-31 1001 ABL 18 22 16
558 1966-04-05 1966-06-31 1001 ABL 17 22 16
559 1966-04-06 1966-06-31 1001 ABL 18 22 16
.
.
.
724 1966-06-31 1966-06-31 1001 ABL 22 22 16
当然我不确定这是否是最好的数据集格式,并希望得到建议。我的最终目的是能够在一次分析中同时使用每日和每季度数据。举例来说,我希望能够找到前20%百分位的季度资产回报率和过去10天每日价格最低的股票。
答案 0 :(得分:1)
在每个数据框中创建一个"yearqtr"
类列,然后使用公共列名执行两个数据框的左连接。最后使用ave
计算平均值。
library(zoo) # yearqtr class
fundq <- transform(fund, yearqtr = as.yearqtr(quarterlydate))
pricesq <- transform(prices, yearqtr = as.yearqtr(dailydate))
m <- merge(pricesq, fundq, all.x = TRUE)
transform(m, avg_price = ave(price, tic, yearqtr))