在mutate()中调用lm()

时间:2014-07-15 19:21:09

标签: r lm dplyr

我想知道是否可以在dplyr包的mutate()中使用lm()。目前我有一个数据框" date"," company"," return"和" market.ret"可重现如下:

library(dplyr)
n.dates <- 60
n.stocks <- 2
date <- seq(as.Date("2011-07-01"), by=1, len=n.dates)
symbol <- replicate(n.stocks, paste0(sample(LETTERS, 5), collapse = ""))
x <- expand.grid(date, symbol)
x$return <- rnorm(n.dates*n.stocks, 0, sd = 0.05)
names(x) <- c("date", "company", "return")
x <- group_by(x, date)    
x <- mutate(x, market.ret = mean(x$return, na.rm = TRUE))

现在,我希望每个公司都适合&#34;返回&#34;通过&#34; market.ret&#34;,计算线性回归系数并将斜率存储在新列中。我希望用mutate()来做,但下面的代码不起作用:

x <- group_by(x, company)
x <- mutate(x, beta = coef(lm(x$return~x$market.ret))[[2]])

R报告的错误是:

Error in terms.formula(formula, data = data) : 
invalid term in model formula

提前感谢任何建议!

2 个答案:

答案 0 :(得分:7)

这似乎对我有用:

group_by(x, company) %>%
    do(data.frame(beta = coef(lm(return ~ market.ret,data = .))[2])) %>%
    left_join(x,.)

答案 1 :(得分:2)

您似乎想要计算所有公司的每日市场回报率,然后在所有日期内回归每家公司的回报与市场回报。如果是这样,这是使用data.table的解决方案;非常大的数据集可能会更快。

library(data.table) ## 1.9.2+
setDT(x)[ , market.ret := mean(return), by = date]
x[, beta := coef(lm(return ~ market.ret, data = .SD))[[2]], by = company]

其中x如下所示(使用set.seed表示可重复性):

set.seed(1L)     # for reproducible example
n.dates <- 60
n.stocks <- 2
date <- seq(as.Date("2011-07-01"), by=1, len=n.dates)
symbol <- replicate(n.stocks, paste0(sample(LETTERS, 5), collapse = ""))
x <- expand.grid(date, symbol)
x$return <- rnorm(n.dates*n.stocks, 0, sd = 0.05)
names(x) <- c("date", "company", "return")