我有一个如下所示的矩阵:
col_1 col_2 value
A B 2.1
A C 1.3
B C 4.6
A D 1.4
....
我想要一个相似度矩阵:
A B C D
A X 2.1 1.3 1.4
B 2.1 X 4.6 ...
C ... ... X ...
D ... ... ... X
所以行和列的名称分别是A,B,C,D,它从第三列获取值并将其添加到矩阵中 问题还在于原始矩阵的长度约为10000行。
答案 0 :(得分:2)
罗兰(Roland)建议,您可以使用dcast()
:
library(data.table)
dcast(df, col_1 ~ col_2)
## col_1 B C D
## 1 A 2.1 1.3 1.4
## 2 B NA 4.6 NA
位置:
df <- data.frame(
col_1 = c("A", "A", "B", "A"),
col_2 = c("B","C", "C", "D"),
value = c(2.1, 1.3, 4.6, 1.4)
)
答案 1 :(得分:2)
使用xtabs
和mutate_at
。 sparse = TRUE
将输出转换为sparseMatrix:
library(dplyr)
mat <- df %>%
mutate_at(1:2, factor, levels = unique(c(levels(.$col_1), levels(.$col_2)))) %>%
xtabs(value ~ col_1 + col_2, data=., sparse = TRUE)
mat[lower.tri(mat)] <- mat[upper.tri(mat)]
结果:
4 x 4 sparse Matrix of class "dgCMatrix"
col_2
col_1 A B C D
A . 2.1 1.3 1.4
B 2.1 . 4.6 .
C 1.3 1.4 . .
D 4.6 . . .
答案 2 :(得分:1)
您可以通过以下方式进行操作。 因为没有指定语言,所以我用Python编写了代码
#I assume that your data is in a python pandas dataframe called df
df = ..load your data
list_of_labels = [ 'A','B','C','D' ]
nb_labels = len(list_of_labels)
similarity = np.zeros( (nb_labels,nb_labels) )
for l1, l2, val in zip( df['col_1'] , df['col_2'] , df['value'] ):
i = list_of_labels.index( l1 )
j = list_of_labels.index( l2 )
similarity[i][j] = val
similarity_df = pd.DataFrame(data=similarity, index=list_of_labels, columns=list_of_labels)