dplyr存储向量中的多变量

时间:2018-10-05 12:46:43

标签: r dplyr

我正在对4x4方差-协方差矩阵的后验分布进行特征分解。为此,我在dplyr / tidyverse管道中使用eigen函数:

set.seed(1)
# Variance and covariances of 4 variables
A1  <- rnorm(1000,10,1)
A2  <- rnorm(1000,10,1)
A3  <- rnorm(1000,10,1)
A4  <- rnorm(1000,10,1)
C12 <- rnorm(1000,0,1)
C13 <- rnorm(1000,0,1)
C14 <- rnorm(1000,0,1)
C23 <- rnorm(1000,0,1)
C24 <- rnorm(1000,0,1)
C34 <- rnorm(1000,0,1)

# Create posterior tibble
w1_post <- as_tibble(cbind(A1, C12, C13, C14, A2, C23, C24, A3, C34, A4))

# Get 1st-4th eigenvalues of each variance-covariance matrix
w1_post %>%
  rowwise %>%
    mutate(
      eig1 = 
        eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
          A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][1],
      eig2 = 
        eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
          A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][2],
      eig3 = 
        eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
          A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][3],
      eig4 = 
        eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
          A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][4]) %>%
  select(starts_with('eig')) -> eig_post

生产

> eig_post
Source: local data frame [1,000 x 4]
Groups: <by row>

# A tibble: 1,000 x 4
    eig1  eig2  eig3  eig4
   <dbl> <dbl> <dbl> <dbl>
 1  12.3 11.0  10.4   6.67
 2  12.8 10.1   9.19  7.61
 3  13.5 12.2   8.20  7.34
 4  12.7 12.2   8.91  7.68
 5  12.9  9.70  9.41  6.74
 6  12.2 10.6   8.62  7.70
 7  13.1 12.5   9.21  8.34
 8  12.9  9.76  7.87  6.96
 9  12.8 11.6   8.21  6.46
10  12.5 11.6   9.85  8.13
# ... with 990 more rows

如您所见,这每行执行本征分解四次-比实际需要多四倍,并且减慢了我的脚本的速度! 我可以一次使用dplyr / tidyverse管道来突变多个变量,将eigen(*matrix*)[[1]][1:4]产生的向量分布在四个变量上吗?所以我需要得到上面代码产生的结果,但是每行进行一次本征分解。我以为这样的事情会奏效,但没有运气:

w1_post %>%
  rowwise %>%
    mutate(c(eig1, eig2, eig3, eig4) = 
      eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
        A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][1:4]) %>%
  select(starts_with('eig')) -> eig_post

2 个答案:

答案 0 :(得分:1)

通过先将计算结果存储为列表列,然后在后续步骤中仅提取值,可以避免计算4次本征分解。如果您希望将其保留在管道中,可以这样进行:

eig_post <- w1_post %>%
  rowwise %>%
  mutate(
    pre_eig = list(eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
                     A3, C34, C14, C24, C34, A4), nrow = 4)))
  ) %>%
  mutate( 
    eig1 = pre_eig[[1]][1], 
    eig2 = pre_eig[[1]][2], 
    eig3 = pre_eig[[1]][3], 
    eig4 = pre_eig[[1]][4]) %>%
  select(starts_with("eig"))

答案 1 :(得分:1)

以下是使用purrr::map系列函数的解决方案:

eig_post <- w1_post %>%

    ## Collapse columns into a vector
    transmute( x = pmap( list(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
                              A3, C34, C14, C24, C34, A4), c ) ) %>%

    ## Compose the 4x4 matrices from each vector
    mutate( mtx = map( x, matrix, nrow=4 ) ) %>%

    ## Perform a single decomposition and retrieve all 4 eigenvalues
    mutate( eig = map( mtx, ~eigen(.x)$values ) ) %>%

    ## Annotate the vector of eigenvalues with the desired names
    mutate( eig = map( eig, set_names, str_c("eig", 1:4) ) ) %>%

    ## Reshape the data frame by effectively unnesting the vector
    with( invoke( bind_rows, eig ) )