使用数据(FANG),我知道交易量和开盘价之间存在平滑的关系。我也知道,最预测的滚动均值的长度因股票而异。对于某些人来说很短,每天2天。对于其他10.我想为每种股票创建长度在2到10天之间的多种滚动方式。
到目前为止,我尝试了tibbletime程序包并获得了一个开始,这样我就可以计算出一个的多个滚动平均值。
library(tibbletime)
library(tidyverse)
data("FANG")
FB <- FANG %>% filter(symbol == “FB”)
meanstep <- seq(2, 10, 1)
col_names <- map_chr(meanstep, ~paste0("rollmean_", .x))
rollers <- map(meanstep, ~rollify(mean, window = .x)) %>% set_names(nm = col_names)
FB_multiroll<- bind_cols(FB, invoke_map(rollers, x = FB$volume))
但是,我似乎无法弄清楚如何在按多只股票分组时进行这项工作。
我尝试添加:
FANG_with_multiroll<- FANG %>% group_by(symbol) %>% bind_cols(FANG, invoke_map(rollers, x =FANG$volume)
但这没用。它创建了滚动方式,但不是按组。取而代之的是,无论“ symbol”如何,它仅占用整个数据帧。任何想法,将不胜感激。我设法使它起作用,我计划找到每个符号的最高相关性或rsquared。如果您也对更好的方法有想法,我很感兴趣。
答案 0 :(得分:0)
OP已用dplyr和purr标记了该问题,但是6个月后答案都没有解决该问题。
在1.12.0版(于2019年1月13日在CRAN上发布)中,data.table软件包获得了frollmean()
函数,可用于创建按组长度不同的多种滚动方式。
data(FANG, package = "tibbletime")
library(data.table) # version 1.12.0 +
meanstep <- 2:10
FANG_with_multiroll <- as.data.table(FANG)[
, sprintf("rollmean_%02i", meanstep) := frollmean(volume, meanstep), by = symbol][]
FANG_with_multiroll
symbol date open high low close volume adjusted rollmean_02 rollmean_03 1: FB 2013-01-02 27.44 28.18 27.420 28.00 69846400 28.00 NA NA 2: FB 2013-01-03 27.88 28.47 27.590 27.77 63140600 27.77 66493500 NA 3: FB 2013-01-04 28.01 28.93 27.830 28.76 72715400 28.76 67928000 68567466.7 4: FB 2013-01-07 28.69 29.79 28.650 29.42 83781800 29.42 78248600 73212600.0 5: FB 2013-01-08 29.51 29.60 28.860 29.06 45871300 29.06 64826550 67456166.7 --- 4028: GOOG 2016-12-23 790.90 792.74 787.280 789.91 623400 789.91 796250 933733.3 4029: GOOG 2016-12-27 790.68 797.86 787.657 791.55 789100 791.55 706250 793866.7 4030: GOOG 2016-12-28 793.70 794.23 783.200 785.05 1132700 785.05 960900 848400.0 4031: GOOG 2016-12-29 783.33 785.93 778.920 782.79 742200 782.79 937450 888000.0 4032: GOOG 2016-12-30 782.75 782.78 770.410 771.82 1760200 771.82 1251200 1211700.0 rollmean_04 rollmean_05 rollmean_06 rollmean_07 rollmean_08 rollmean_09 rollmean_10 1: NA NA NA NA NA NA NA 2: NA NA NA NA NA NA NA 3: NA NA NA NA NA NA NA 4: 72371050 NA NA NA NA NA NA 5: 66377275 67071100 NA NA NA NA NA --- 4028: 931575 990440 1230083.3 1286314 1333588 1420944 1488560 4029: 897575 903080 956883.3 1167086 1224163 1273089 1357760 4030: 878575 944600 941350.0 982000 1162788 1214000 1259050 4031: 821850 851300 910866.7 912900 952025 1116056 1166820 4032: 1106050 1009520 1002783.3 1032200 1018813 1041822 1180470
为了证明这适用于每个组,我们可以打印每个组的前几行(也只能打印前十列):
FANG_with_multiroll[, head(.SD, 3), .SDcols = 1:10, by = symbol]
symbol symbol date open high low close volume adjusted rollmean_02 rollmean_03 1: FB FB 2013-01-02 27.4400 28.1800 27.4200 28.0000 69846400 28.00000 NA NA 2: FB FB 2013-01-03 27.8800 28.4700 27.5900 27.7700 63140600 27.77000 66493500 NA 3: FB FB 2013-01-04 28.0100 28.9300 27.8300 28.7600 72715400 28.76000 67928000 68567467 4: AMZN AMZN 2013-01-02 256.0800 258.1000 253.2600 257.3100 3271000 257.31000 NA NA 5: AMZN AMZN 2013-01-03 257.2700 260.8800 256.3700 258.4800 2750900 258.48001 3010950 NA 6: AMZN AMZN 2013-01-04 257.5800 259.8000 256.6500 259.1500 1874200 259.14999 2312550 2632033 7: NFLX NFLX 2013-01-02 95.2100 95.8100 90.6900 92.0100 19431300 13.14429 NA NA 8: NFLX NFLX 2013-01-03 91.9700 97.9200 91.5300 96.5900 27912500 13.79857 23671900 NA 9: NFLX NFLX 2013-01-04 96.5400 97.7100 95.5400 95.9800 17761100 13.71143 22836800 21701633 10: GOOG GOOG 2013-01-02 719.4212 727.0013 716.5512 723.2512 5101500 361.26435 NA NA 11: GOOG GOOG 2013-01-03 724.9313 731.9312 720.7212 723.6713 4653700 361.47415 4877600 NA 12: GOOG GOOG 2013-01-04 729.3412 741.4713 727.6812 737.9713 5547600 368.61701 5100650 5100933