谁能解释为什么这两个相关矩阵返回不同的结果?
library(recommenderlab)
data(MovieLense)
cor_mat <- as( similarity(MovieLense, method = "pearson", which = "items"), "matrix" )
cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), use = "pairwise.complete.obs") )
print( cor_mat[1:5, 1:5] )
print( cor_mat_base[1:5, 1:5] )
答案 0 :(得分:2)
dissimilarity() = 1 - pmax(cor(), 0)
R基本函数。另外,重要的是为两个都指定method
使用相同的值:
library("recommenderlab")
data(MovieLense)
cor_mat <- as( dissimilarity(MovieLense, method = "pearson",
which = "items"), "matrix" )
cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), method = "pearson"
, use = "pairwise.complete.obs") )
print( cor_mat[1:5, 1:5] )
print(1- cor_mat_base[1:5, 1:5] )
> print( cor_mat[1:5, 1:5] )
Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
Toy Story (1995) 0.0000000 0.7782159 0.8242057 0.8968647 0.6135248
GoldenEye (1995) 0.7782159 0.0000000 0.7694644 0.7554443 0.7824406
Four Rooms (1995) 0.8242057 0.7694644 0.0000000 1.0000000 0.8153877
Get Shorty (1995) 0.8968647 0.7554443 1.0000000 0.0000000 1.0000000
Copycat (1995) 0.6135248 0.7824406 0.8153877 1.0000000 0.0000000
> print(1- cor_mat_base[1:5, 1:5] )
Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
Toy Story (1995) 0.0000000 0.7782159 0.8242057 0.8968647 0.6135248
GoldenEye (1995) 0.7782159 0.0000000 0.7694644 0.7554443 0.7824406
Four Rooms (1995) 0.8242057 0.7694644 0.0000000 1.2019687 0.8153877
Get Shorty (1995) 0.8968647 0.7554443 1.2019687 0.0000000 1.2373503
Copycat (1995) 0.6135248 0.7824406 0.8153877 1.2373503 0.0000000
要很好地理解它,请检查两个软件包的详细信息:)。
OP /编辑:
重要的是要指出,即使1-dissimilarity
和cor
的值也有些许差异,其中cor
大于1。这是因为dissimilarity()
设置了一个下限为0(即不返回负数),并且也执行cor()
可能返回大于1的值。https://www.rdocumentation.org/packages/stats/versions/3.6.0/topics/cor他们只指定
For r <- cor(*, use = "all.obs"), it is now guaranteed that all(abs(r) <= 1).
应该对此进行评估。