按日期的数据矩阵相关矩阵

时间:2019-02-13 23:15:01

标签: r dplyr

我有一个数据框架,其中包含因子列和日期列。我正在寻找一种最有效的方法来计算每个日期的每个因素对的相关性。这是我正在使用的日期框架示例。

structure(list(MktDate = structure(c(17865, 17865, 17865, 17896, 
17896, 17896, 17927, 17927, 17927), class = "Date"), Var1 = c(1, 
2, 3, 1, 2, 3, 1, 2, 3), Var2 = c(3, 5, 2, 4, 3, 2, 1, 2, 5), 
    Var3 = c(8, 7, 6, 9, 8, 9, 5, 8, 7)), class = "data.frame", row.names = c(NA, 
-9L))

我希望日期框架以类似于下面显示的格式显示结果

MktDate,FactorPair,Correl
2018-11-30,Var1Var2,-.32733
2018-11-30,Var1Var3,-1
2018-11-30,Var2Var3,.3273
2018-12-31,Var1Var2,-1
...

我猜测使用某种形式的dplyr可以很容易地实现这一点并应用,但是我不确定如果不使用一堆嵌套循环,该怎么做。

感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

这是一种更通用的解决方案,可以计算成对的n列对的相关性。

library(reshape2)
library(dplyr)

#Original Data
df_og = data.frame(MktDate = structure(c(17865, 17865, 17865, 17896, 
                                         17896, 17896, 17927, 17927, 17927), class = "Date"),
                   Var1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
                   Var2 = c(3, 5, 2, 4, 3, 2, 1, 2, 5),
                   Var3 = c(8, 7, 6, 9, 8, 9, 5, 8, 7))

#Dataframe to store result
df_result = data.frame(MktDate = unique(df_og$MktDate))

#Create pars of variables to eventually itterate over
combs = utils::combn(c("Var1","Var2","Var3"),2)%>%
  t()

#Convert to long format data frame and store elements in a vector for each date variable pair
df = df_og %>%
  melt(id.vars = "MktDate")%>%
  group_by(MktDate,variable)%>%
  summarise(val = list(value))%>%
  ungroup()

# Itterate over each combination
for(i in seq(1,nrow(combs))){
  combination = combs[i,] # Select the combination
  new_col_name = paste0(combination,collapse = "") #Define the new column name
  df_result = df %>%
    filter(variable %in% combination)%>% #Select only the variables in this combination
    dcast(MktDate~variable)%>% #Convert back into "fat" formation
    group_by(MktDate)%>% #This resets the row names
    mutate_(.dots = setNames(
      paste0("cor(unlist(",combination[1],"),unlist(",combination[2],"))"),
      new_col_name))%>% # Compute the correlation
    ungroup()%>%
    select_(.dots = c("MktDate",new_col_name))%>%
    inner_join(df_result,by = "MktDate") #Join with the result dataframe
}

# If required convert it back into a long format
df_result = df_result%>%
  melt(id.vars = "MktDate")%>%
  arrange(MktDate)

这段代码的优点是它很灵活。您可以添加一个新的“ Var4,Var5,Var6”,只需在combn中提供新的列名。 combn计算每对变量,其余代码计算这对变量之间的相关性。

答案 1 :(得分:0)

我创建了一个列来将代表配对在一起,如果它们不是全部一式三份,那么您需要进行调整。

library(reshape)
df<-structure(list(MktDate = structure(c(17865, 17865, 17865, 17896, 
              17896, 17896, 17927, 17927, 17927), class = "Date"), Var1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3), 
              Var2 = c(3, 5, 2, 4, 3, 2, 1, 2, 5), Var3 = c(8, 7, 6, 9, 8, 9, 5, 8, 7)),
              class = "data.frame", row.names = c(NA,-9L))
df$rep<- rep(seq(1,3),3)

df.mut<-reshape(df, idvar = "MktDate", timevar = "rep", direction = "wide")

var1var2=apply(df.mut,1, function(x) cor(as.numeric(x[seq(2,10,3)]), as.numeric(x[seq(3,10,3)])))
var2var3=apply(df.mut,1, function(x) cor(as.numeric(x[seq(3,10,3)]), as.numeric(x[seq(4,10,3)])))
var1var3=apply(df.mut,1, function(x) cor(as.numeric(x[seq(2,10,3)]), as.numeric(x[seq(4,10,3)])))

results <- data.frame(MktDate = rep(unique(df$MktDate)), FactorPair = rep(c("Var1Var2", "Var2Var3", "Var1Var3"), each =3 ),
                      cor= c(var1var2,var2var3,var1var3))
results <- results[order(results$MktDate),]