在SQLite中,我想找到我用GROUP BY
定义的(已记录)系列的第一个差异的标准偏差。我的数据提供商给了我每日价格系列,但我想找到年度化的每日波动率(每日回报的标准差 - 系列自然对数的第一个差异 - 每年)。我可以将数据带到R,然后使用ddply()
,但我想在SQLite中完全执行此操作。我尝试了RSQLite.extfunctions
package中的difference()
函数,但我的用法是错误的。我希望它在R中像diff()
一样工作,但我找不到太多文档。
这会生成一些数据。
stocks <- 5
years <- 5
list.n <- as.list(rep(252, stocks * years))
list.mean <- as.list(rep(0, stocks * years))
list.sd <- as.list(abs(runif(stocks * years, min = 0, max = 0.1)))
list.po <- as.list(runif(n = stocks, min = 25, max = 100))
list.ret <- mapply(rnorm, n = list.n, mean = list.mean, sd = list.sd, SIMPLIFY = F)
my.price <- function(po, ret) po * exp(cumsum(ret))
list.price <- mapply(my.price, po = list.po, ret = list.ret, SIMPLIFY = F)
gvkey <- rep(seq(stocks), each = 252 * years)
day <- rep(seq(252), n = stocks * years)
fyr <- rep(seq(years), n = stocks, each = 252)
data.dly <- data.frame(gvkey, fyr, day, p = unlist(list.price))
以下是我如何使用ddply()
和结果。
# I could do this easily with ddply and subset
library(plyr)
data.dly <- ddply(data.dly, .(gvkey, fyr), transform, vol = sd(diff(log(p))))
data.ann <- subset(data.dly, day == 252)
head(data.ann)
gvkey fyr day p vol
252 1 1 252 86.08568 0.077287182
504 1 2 252 43.32113 0.066741862
756 1 3 252 68.69734 0.084419564
1008 1 4 252 75.37267 0.006003969
1260 1 5 252 17.53583 0.083688727
1512 2 1 252 168.44656 0.035959492
这是我的(失败的)SQLite尝试和错误。
# but I can't figure it out in SQLite
library(RSQLite)
library(RSQLite.extfuns)
db <- dbConnect(SQLite())
init_extensions(db)
[1] TRUE
dbWriteTable(db, name = "data_dly", value = data.dly)
[1] TRUE
temp <- dbGetQuery(db, "SELECT stdev(difference(log(p))) FROM data_dly GROUP BY gvkey, fyr ORDER BY gvkey, fyr, day")
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: wrong number of arguments to function difference())
difference()
是否需要以逗号分隔的数字列表?我可以在SQLite中完全执行此操作吗?或者我需要在R中执行?谢谢!
答案 0 :(得分:2)
difference SQL command有两个字符参数,与R的diff
命令有不同的含义。
使用SQL命令检索数据,然后使用R。
执行统计temp <- dbGetQuery(db, "SELECT p FROM data_dly GROUP BY gvkey, fyr ORDER BY gvkey, fyr, day")
sd(diff(log(temp$p)))
答案 1 :(得分:2)
尝试使用data.dly
作为帖子中的数据框:
library(sqldf)
out <- sqldf("select A.gvkey, A.fyr, stdev(log(A.p) - log(B.p)) vol
from `data.dly` A join `data.dly` B
where A.day = B.day + 1
and A.gvkey = B.gvkey
and A.fyr = B.fyr
group by A.gvkey, A.fyr")
这给出了:
> head(out)
gvkey fyr vol
1 1 1 0.09312510
2 1 2 0.01905447
3 1 3 0.01651095
4 1 4 0.06962667
5 1 5 0.05243940
6 2 1 0.03039751