如何在R中创建协方差矩阵?

时间:2018-05-19 16:35:28

标签: r

我正在尝试从零开始构建协方差矩阵(cov()函数)。我的任务是不使用任何包。因此我创建了我的函数:

meanf <- function(x){
sum(x) / length(x)
}

sampleCov <- function(x,y){
  stopifnot(identical(length(x), length(y)))
  sum((x - meanf(x)) * (y - meanf(y))) / (length(x) - 1)
}

> sampleCov(winequality_red$quality, winequality_red$alcohol)
[1] 0.409789

不幸的是,我被困在这里。我试图应用的所有循环都没有任何意义。当然,可以只复制sampleCov函数并为每个可能的组合制作它,但这不是我的观点。

3 个答案:

答案 0 :(得分:1)

您需要矩阵乘法%*%

sampleCov <- function(x,y){
  stopifnot(identical(length(x), length(y)))
  sum((x - mean(x)) %*% (y - mean(y))) / (length(x) - 1)
}

> sampleCov(rnorm(10000),rnorm(10000))
[1] 0.01808466

答案 1 :(得分:1)

如果我理解正确,那么我相信您想重新创建一个协变量输出,就像cov函数返回的那样。

OP给定功能:

meanf <- function(x){
    sum(x) / length(x)
}

sampleCov <- function(x,y){
    stopifnot(identical(length(x), length(y)))
    sum((x - meanf(x)) * (y - meanf(y))) / (length(x) - 1)
}

您可以尝试这种方式,我在此处获取了mtcars数据:

协变量功能:

vars <- names(mtcars)
egrid <- expand.grid(vars, vars)
egrid <- data.frame(sapply(egrid, as.character),stringsAsFactors = F)
egrid <- egrid[order(egrid$Var1, egrid$Var2),]
mat <- vector("list", nrow(egrid))

for(i in 1:nrow(egrid)){
    mat[[i]] <- sampleCov(mtcars[,egrid[i,"Var1"]], mtcars[,egrid[i,"Var2"]])
}

finaldat <- cbind(egrid, cov = do.call('rbind', mat))
finaldat_list <- split(finaldat,  finaldat$Var1)
mat_form <- do.call('cbind', finaldat_list)

cov_values <- mat_form[,grepl("\\.cov",names(mat_form))]
col_values <- mat_form[,paste0(egrid$Var1[1],".Var2")]

final_matrix_cov <- cbind(col_values, cov_values)

示例输出:

> final_matrix_cov
    col_values       am.cov    carb.cov     cyl.cov    disp.cov
9          mpg   1.80393145 -5.36310484  -9.1723790  -633.09721
20         cyl  -0.46572581  1.52016129   3.1895161   199.66028
31        disp -36.56401210 79.06875000 199.6602823 15360.79983
42          hp  -8.32056452 83.03629032 101.9314516  6721.15867

答案 2 :(得分:1)

这可能比您需要的多一点,但它应该回答您的问题,我认为它是协方差,相关性等实际应用的一个很好的例证。

# load the data
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))

# calculate the necessary values:
# I) expected returns for the two assets
er_x <- mean(df$x)
er_y <- mean(df$y)

# II) risk (standard deviation) as a risk measure
sd_x <- sd(df$x)
sd_y <- sd(df$y)

# III) covariance
cov_xy <- cov(df$x, df$y)

# create 1000 portfolio weights (omegas)
x_weights <- seq(from = 0, to = 1, length.out = 1000)

# create a data.table that contains the weights for the two assets
two_assets <- data.table(wx = x_weights,
                         wy = 1 - x_weights)

# calculate the expected returns and standard deviations for the 1000 possible portfolios
two_assets[, ':=' (er_p = wx * er_x + wy * er_y,
                   sd_p = sqrt(wx^2 * sd_x^2 +
                               wy^2 * sd_y^2 +
                               2 * wx * (1 - wx) * cov_xy))]
two_assets

# lastly plot the values
ggplot() +
  geom_point(data = two_assets, aes(x = sd_p, y = er_p, color = wx)) +
  geom_point(data = data.table(sd = c(sd_x, sd_y), mean = c(er_x, er_y)),
  aes(x = sd, y = mean), color = "red", size = 3, shape = 18) +
  # Miscellaneous Formatting
  theme_bw() + ggtitle("Possible Portfolios with Two Risky Assets") +
  xlab("Volatility") + ylab("Expected Returns") +
  scale_y_continuous(label = percent, limits = c(0, max(two_assets$er_p) * 1.2)) +
  scale_x_continuous(label = percent, limits = c(0, max(two_assets$sd_p) * 1.2)) +
  scale_color_continuous(name = expression(omega[x]), labels = percent)

有关所有详细信息,请参阅以下链接。

https://datashenanigan.wordpress.com/2016/05/24/a-gentle-introduction-to-finance-using-r-efficient-frontier-and-capm-part-1/