与R推荐函数包中的R()函数的相似性与R()函数的结果不同吗?

时间:2019-06-19 14:12:00

标签: r recommender-systems recommenderlab

谁能解释为什么这两个相关矩阵返回不同的结果?

library(recommenderlab)
data(MovieLense)
cor_mat <- as( similarity(MovieLense, method = "pearson", which = "items"), "matrix" )
cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), use = "pairwise.complete.obs") )
print( cor_mat[1:5, 1:5] )
print( cor_mat_base[1:5, 1:5] )

1 个答案:

答案 0 :(得分:2)

dissimilarity() = 1 - pmax(cor(), 0) R基本函数。另外,重要的是为两个都指定method使用相同的值:

library("recommenderlab")
data(MovieLense)
cor_mat <- as( dissimilarity(MovieLense, method = "pearson", 
                          which = "items"), "matrix" )
cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), method = "pearson"
                                      , use = "pairwise.complete.obs") )
print( cor_mat[1:5, 1:5] )
print(1- cor_mat_base[1:5, 1:5] )

> print( cor_mat[1:5, 1:5] )
                  Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
Toy Story (1995)         0.0000000        0.7782159         0.8242057         0.8968647      0.6135248
GoldenEye (1995)         0.7782159        0.0000000         0.7694644         0.7554443      0.7824406
Four Rooms (1995)        0.8242057        0.7694644         0.0000000         1.0000000      0.8153877
Get Shorty (1995)        0.8968647        0.7554443         1.0000000         0.0000000      1.0000000
Copycat (1995)           0.6135248        0.7824406         0.8153877         1.0000000      0.0000000
> print(1- cor_mat_base[1:5, 1:5] )
                  Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
Toy Story (1995)         0.0000000        0.7782159         0.8242057         0.8968647      0.6135248
GoldenEye (1995)         0.7782159        0.0000000         0.7694644         0.7554443      0.7824406
Four Rooms (1995)        0.8242057        0.7694644         0.0000000         1.2019687      0.8153877
Get Shorty (1995)        0.8968647        0.7554443         1.2019687         0.0000000      1.2373503
Copycat (1995)           0.6135248        0.7824406         0.8153877         1.2373503      0.0000000

要很好地理解它,请检查两个软件包的详细信息:)。

OP /编辑: 重要的是要指出,即使1-dissimilaritycor的值也有些许差异,其中cor大于1。这是因为dissimilarity()设置了一个下限为0(即不返回负数),并且也执行cor()可能返回大于1的值。https://www.rdocumentation.org/packages/stats/versions/3.6.0/topics/cor他们只指定

  

For r <- cor(*, use = "all.obs"), it is now guaranteed that all(abs(r) <= 1).

应该对此进行评估。