使用RSQLite查找使用GROUP BY定义的系列的第一个差异的标准偏差

时间:2011-07-14 15:20:58

标签: sqlite r group-by

在SQLite中,我想找到我用GROUP BY定义的(已记录)系列的第一个差异的标准偏差。我的数据提供商给了我每日价格系列,但我想找到年度化的每日波动率(每日回报的标准差 - 系列自然对数的第一个差异 - 每年)。我可以将数据带到R,然后使用ddply(),但我想在SQLite中完全执行此操作。我尝试了RSQLite.extfunctions package中的difference()函数,但我的用法是错误的。我希望它在R中像diff()一样工作,但我找不到太多文档。

这会生成一些数据。

stocks <- 5
years <- 5
list.n <- as.list(rep(252, stocks * years))
list.mean <- as.list(rep(0, stocks * years))
list.sd <- as.list(abs(runif(stocks * years, min = 0, max = 0.1)))
list.po <- as.list(runif(n = stocks, min = 25, max = 100)) 
list.ret <- mapply(rnorm, n = list.n, mean = list.mean, sd = list.sd, SIMPLIFY = F)
my.price <- function(po, ret) po * exp(cumsum(ret))
list.price <- mapply(my.price, po = list.po, ret = list.ret, SIMPLIFY = F)
gvkey <- rep(seq(stocks), each = 252 * years)
day <- rep(seq(252), n = stocks * years)
fyr <- rep(seq(years), n = stocks, each = 252)
data.dly <- data.frame(gvkey, fyr, day, p = unlist(list.price))

以下是我如何使用ddply()和结果。

# I could do this easily with ddply and subset
library(plyr)
data.dly <- ddply(data.dly, .(gvkey, fyr), transform, vol = sd(diff(log(p))))
data.ann <- subset(data.dly, day == 252)
head(data.ann)
     gvkey fyr day         p         vol
252      1   1 252  86.08568 0.077287182
504      1   2 252  43.32113 0.066741862
756      1   3 252  68.69734 0.084419564
1008     1   4 252  75.37267 0.006003969
1260     1   5 252  17.53583 0.083688727
1512     2   1 252 168.44656 0.035959492

这是我的(失败的)SQLite尝试和错误。

# but I can't figure it out in SQLite
library(RSQLite)
library(RSQLite.extfuns)
db <- dbConnect(SQLite())
init_extensions(db)
[1] TRUE
dbWriteTable(db, name = "data_dly", value = data.dly)
[1] TRUE
temp <- dbGetQuery(db, "SELECT stdev(difference(log(p))) FROM data_dly GROUP BY gvkey, fyr ORDER BY gvkey, fyr, day")
Error in sqliteExecStatement(con, statement, bind.data) : 
  RS-DBI driver: (error in statement: wrong number of arguments to function difference())

difference()是否需要以逗号分隔的数字列表?我可以在SQLite中完全执行此操作吗?或者我需要在R中执行?谢谢!

2 个答案:

答案 0 :(得分:2)

difference SQL command有两个字符参数,与R的diff命令有不同的含义。

使用SQL命令检索数据,然后使用R。

执行统计
temp <- dbGetQuery(db, "SELECT p FROM data_dly GROUP BY gvkey, fyr ORDER BY gvkey, fyr, day")
sd(diff(log(temp$p)))

答案 1 :(得分:2)

尝试使用data.dly作为帖子中的数据框:

library(sqldf)
out <- sqldf("select A.gvkey, A.fyr, stdev(log(A.p) - log(B.p)) vol
    from `data.dly` A join `data.dly` B 
    where A.day = B.day + 1 
        and A.gvkey = B.gvkey 
        and A.fyr = B.fyr 
    group by A.gvkey, A.fyr")

这给出了:

> head(out)
  gvkey fyr        vol
1     1   1 0.09312510
2     1   2 0.01905447
3     1   3 0.01651095
4     1   4 0.06962667
5     1   5 0.05243940
6     2   1 0.03039751