我正在尝试从零开始构建协方差矩阵(cov()函数)。我的任务是不使用任何包。因此我创建了我的函数:
meanf <- function(x){
sum(x) / length(x)
}
sampleCov <- function(x,y){
stopifnot(identical(length(x), length(y)))
sum((x - meanf(x)) * (y - meanf(y))) / (length(x) - 1)
}
> sampleCov(winequality_red$quality, winequality_red$alcohol)
[1] 0.409789
不幸的是,我被困在这里。我试图应用的所有循环都没有任何意义。当然,可以只复制sampleCov函数并为每个可能的组合制作它,但这不是我的观点。
答案 0 :(得分:1)
您需要矩阵乘法%*%
。
sampleCov <- function(x,y){
stopifnot(identical(length(x), length(y)))
sum((x - mean(x)) %*% (y - mean(y))) / (length(x) - 1)
}
> sampleCov(rnorm(10000),rnorm(10000))
[1] 0.01808466
答案 1 :(得分:1)
如果我理解正确,那么我相信您想重新创建一个协变量输出,就像cov
函数返回的那样。
OP给定功能:
meanf <- function(x){
sum(x) / length(x)
}
sampleCov <- function(x,y){
stopifnot(identical(length(x), length(y)))
sum((x - meanf(x)) * (y - meanf(y))) / (length(x) - 1)
}
您可以尝试这种方式,我在此处获取了mtcars
数据:
协变量功能:
vars <- names(mtcars)
egrid <- expand.grid(vars, vars)
egrid <- data.frame(sapply(egrid, as.character),stringsAsFactors = F)
egrid <- egrid[order(egrid$Var1, egrid$Var2),]
mat <- vector("list", nrow(egrid))
for(i in 1:nrow(egrid)){
mat[[i]] <- sampleCov(mtcars[,egrid[i,"Var1"]], mtcars[,egrid[i,"Var2"]])
}
finaldat <- cbind(egrid, cov = do.call('rbind', mat))
finaldat_list <- split(finaldat, finaldat$Var1)
mat_form <- do.call('cbind', finaldat_list)
cov_values <- mat_form[,grepl("\\.cov",names(mat_form))]
col_values <- mat_form[,paste0(egrid$Var1[1],".Var2")]
final_matrix_cov <- cbind(col_values, cov_values)
示例输出:
> final_matrix_cov
col_values am.cov carb.cov cyl.cov disp.cov
9 mpg 1.80393145 -5.36310484 -9.1723790 -633.09721
20 cyl -0.46572581 1.52016129 3.1895161 199.66028
31 disp -36.56401210 79.06875000 199.6602823 15360.79983
42 hp -8.32056452 83.03629032 101.9314516 6721.15867
答案 2 :(得分:1)
这可能比您需要的多一点,但它应该回答您的问题,我认为它是协方差,相关性等实际应用的一个很好的例证。
# load the data
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
# calculate the necessary values:
# I) expected returns for the two assets
er_x <- mean(df$x)
er_y <- mean(df$y)
# II) risk (standard deviation) as a risk measure
sd_x <- sd(df$x)
sd_y <- sd(df$y)
# III) covariance
cov_xy <- cov(df$x, df$y)
# create 1000 portfolio weights (omegas)
x_weights <- seq(from = 0, to = 1, length.out = 1000)
# create a data.table that contains the weights for the two assets
two_assets <- data.table(wx = x_weights,
wy = 1 - x_weights)
# calculate the expected returns and standard deviations for the 1000 possible portfolios
two_assets[, ':=' (er_p = wx * er_x + wy * er_y,
sd_p = sqrt(wx^2 * sd_x^2 +
wy^2 * sd_y^2 +
2 * wx * (1 - wx) * cov_xy))]
two_assets
# lastly plot the values
ggplot() +
geom_point(data = two_assets, aes(x = sd_p, y = er_p, color = wx)) +
geom_point(data = data.table(sd = c(sd_x, sd_y), mean = c(er_x, er_y)),
aes(x = sd, y = mean), color = "red", size = 3, shape = 18) +
# Miscellaneous Formatting
theme_bw() + ggtitle("Possible Portfolios with Two Risky Assets") +
xlab("Volatility") + ylab("Expected Returns") +
scale_y_continuous(label = percent, limits = c(0, max(two_assets$er_p) * 1.2)) +
scale_x_continuous(label = percent, limits = c(0, max(two_assets$sd_p) * 1.2)) +
scale_color_continuous(name = expression(omega[x]), labels = percent)
有关所有详细信息,请参阅以下链接。