如何在R中混合每日和每季度数据?

时间:2016-11-12 00:52:19

标签: r data-manipulation

我有来自Compustat的季度基本数据,如下所示:

fund<-data.frame(quarterlydate=as.Date(c("03/31/1966","06/30/1966"), '%m/%d/%Y'), 
               gvkey=c(1000,1000,1001,1001), tic=c("XTL", "XTL", "ABL","ABL"), 
               sales=c(70,75,20,22))

> fund
  quarterlydate gvkey tic sales
1    1966-03-31  1000 XTL    70
2    1966-06-30  1000 XTL    75
3    1966-03-31  1001 ABL    20
4    1966-06-30  1001 ABL    22

我还有来自CRSP的每日价格数据,如下所示:

prices<-data.frame(dailydate=seq(as.Date("1966/01/01"), as.Date("1966/06/30"), "days"), gvkey=c(rep(1000, 181),rep(1001, 181)),
                   tic=c(rep("XTL",181), rep("ABL",181)),
                         price=floor(runif(length(seq(as.Date("1966/01/01"), as.Date("1966/06/30"), "days")), min=0, max=50)))

> head(prices)
   dailydate gvkey tic price
1 1966-01-01  1000 XTL    44
2 1966-01-02  1000 XTL    42
3 1966-01-03  1000 XTL    42
4 1966-01-04  1000 XTL    16
5 1966-01-05  1000 XTL    27
6 1966-01-06  1000 XTL    36

> tail(prices)
     dailydate gvkey tic price
357 1966-06-25  1001 ABL     0
358 1966-06-26  1001 ABL    28
359 1966-06-27  1001 ABL     4
360 1966-06-28  1001 ABL    18
361 1966-06-29  1001 ABL    49
362 1966-06-30  1001 ABL     4

问题:

1)如何合并这样的季度和日常数据集,以获得类似下面的数据框?

2)我如何计算季度平均价格并将价值分配给季度? (下表中的“average_quarterly_price”变量)

我想要一个像这样的合并数据框:

 dailydate   quarterlydates gvkey tic  price  sales average_quarterly_price
1 1966-01-01   1966-03-31    1000  XTL     1    70   32
2 1966-01-02   1966-03-31    1000  XTL    10    70   32
3 1966-01-03   1966-03-31    1000  XTL    14    70   32
4 1966-01-04   1966-03-31    1000  XTL    29    70   32
5 1966-01-05   1966-03-31    1000  XTL     1    70   32
6 1966-01-06   1966-03-31    1000  XTL    43    70   32
.
.
.
182 1966-04-01  1966-06-31   1000  XTL    11    75   41
183 1966-04-02  1966-06-31   1000  XTL    8     75   41 
184 1966-04-03  1966-06-31   1000  XTL    16    75   41
185 1966-04-04  1966-06-31   1000  XTL    14    75   41
186 1966-04-05  1966-06-31   1000  XTL    14    75   41
187 1966-04-06  1966-06-31   1000  XTL    20    75   41
.
.
.
364 1966-01-01  1966-03-31   1001  ABL    18    20   15
365 1966-01-02  1966-03-31   1001  ABL    10    20   15
366 1966-01-03  1966-03-31   1001  ABL    13    20   15
367 1966-01-04  1966-03-31   1001  ABL    13    20   15
368 1966-01-05  1966-03-31   1001  ABL    11    20   15
369 1966-01-06  1966-03-31   1001  ABL    13    20   15
.
.
.
545 1966-04-01  1966-06-31   1001  ABL    14    22   16
555 1966-04-02  1966-06-31   1001  ABL    21    22   16 
556 1966-04-03  1966-06-31   1001  ABL    18    22   16
557 1966-04-04  1966-06-31   1001  ABL    18    22   16
558 1966-04-05  1966-06-31   1001  ABL    17    22   16
559 1966-04-06  1966-06-31   1001  ABL    18    22   16
.
.
.
724 1966-06-31  1966-06-31  1001  ABL    22    22   16

当然我不确定这是否是最好的数据集格式,并希望得到建议。我的最终目的是能够在一次分析中同时使用每日和每季度数据。举例来说,我希望能够找到前20%百分位的季度资产回报率和过去10天每日价格最低的股票。

1 个答案:

答案 0 :(得分:1)

在每个数据框中创建一个"yearqtr"类列,然后使用公共列名执行两个数据框的左连接。最后使用ave计算平均值。

library(zoo) # yearqtr class

fundq <- transform(fund, yearqtr = as.yearqtr(quarterlydate))
pricesq <- transform(prices, yearqtr = as.yearqtr(dailydate))
m <- merge(pricesq, fundq, all.x = TRUE)
transform(m, avg_price = ave(price, tic, yearqtr))