通过data.table中的向量因子进行交叉分析

时间:2014-09-05 14:31:31

标签: r matrix data.table

我有一个data.table dists,如下所示:

Classes ‘data.table’ and 'data.frame':  1800 obs. of  4 variables:
 $ groupname: Factor w/ 8 levels "A","B","C","D",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ start    : int  0 60 120 180 240 300 360 420 480 540 ...
 $ V1       : num  1041 955 962 865 944 ...
 $ vN       : num  0.0042 0.00385 0.00388 0.00349 0.00381 ...
 - attr(*, ".internal.selfref")=<externalptr> 

这里有dput整件事:http://pastebin.com/VW54NfUg

我可以通过因子单独执行vN的每个crossprod。例如

crossprod(as.matrix(dists[c(groupname=="C")]$vN), 
          as.matrix(dists[c(groupname=="D")]$vN))

但我想一次性完成这些操作,并将它们输出为一个看起来像这样的矩阵:

            C           D           E           F           G           H
C 0.000000000                                               
D 0.003515663 0.000000000                            
E 0.003530643 0.003580947 0.000000000          
F 0.003580947 0.003409901 0.003522218 0.000000000          
G 0.003522218 0.003515663 0.003409901 0.003580947 0.000000000 
H 0.003409901 0.003522218 0.003515663 0.003530643 0.003515663  0.000000000

我有一种感觉,这可能非常简单,但我对使用data.table和矩阵感到陌生。我该怎么做?

2 个答案:

答案 0 :(得分:4)

基本上你只是描述矩阵乘法X'X,其中X的列是vN值,每组有一列。您可以使用split-apply-combine范例来计算X:

# Get rid of stray labels
dists$groupname <- as.character(dists$groupname)

# Define X matrix and compute final table
X <- do.call(cbind, lapply(split(dists, dists$groupname), function(x) x$vN))
(cp <- t(X) %*% X)
#             C           D           E           F           G           H
# C 0.003495762 0.003515663 0.003530643 0.003580947 0.003522218 0.003409901
# D 0.003515663 0.003720479 0.003677919 0.003757778 0.003650462 0.003477723
# E 0.003530643 0.003677919 0.003750939 0.003784916 0.003665951 0.003485093
# F 0.003580947 0.003757778 0.003784916 0.003994177 0.003775697 0.003526653
# G 0.003522218 0.003650462 0.003665951 0.003775697 0.003740864 0.003476628
# H 0.003409901 0.003477723 0.003485093 0.003526653 0.003476628 0.003438210

如果你想在主对角线下方0,你可以用diag(cp) <- 0完成。

答案 1 :(得分:3)

正如@josilber所指出的,这是简单的矩阵乘法,你只需要提取矩阵。这是一种更简单,更快捷的提取方法:

setkey(dists, groupname) # making sure it's ordered by groupname

X = dists[, matrix(vN, ncol = length(unique(groupname)))]
colnames(X) = unique(dists$groupname)

crossprod(X, X)
#            C           D           E           F           G           H
#C 0.003495762 0.003515663 0.003530643 0.003580947 0.003522218 0.003409901
#D 0.003515663 0.003720479 0.003677919 0.003757778 0.003650462 0.003477723
#E 0.003530643 0.003677919 0.003750939 0.003784916 0.003665951 0.003485093
#F 0.003580947 0.003757778 0.003784916 0.003994177 0.003775697 0.003526653
#G 0.003522218 0.003650462 0.003665951 0.003775697 0.003740864 0.003476628
#H 0.003409901 0.003477723 0.003485093 0.003526653 0.003476628 0.003438210