Question

我有一个包含产品相似性数据的稀疏出现矩阵。所有产品在x和y上以相同的顺序出现，值为1表示产品是相同的，无论值为0表示产品是否不同。

如下：

P1  P2  P3  P4
P1  1   1   0   0
P2  0   1   0   1
P3  0   0   1   1
P4  0   1   0   1

在这种情况下，P1类似于自身和P2，但P2类似于P4。所以最后P1，P2和P4都是一样的。我需要在R中写一些能为P1，P2和P4分配相同代码的东西：

Product_Name  Ref_Code 
     P1          P1
     P2          P1
     P3          P3
     P4          P1

是否可以在R？

中进行

干杯，

的Dario。

Answer 1

我同意@Prem，根据您的逻辑，所有产品都是相同的。我已经使用reshape2包提供了一个代码示例，以便将您的产品放入长格式。即使您的相似性度量不会在产品之间产生任何差异，您也可以使用melt()的输出，以不同的方式对数据进行排序和过滤，从而达到您想要的效果。

library(reshape2)

data <- read.table ( text = "P1  P2  P3  P4
                          P1  1   1   0   0
                          P2  0   1   0   1
                          P3  0   0   1   1
                          P4  0   1   0   1"
                          , header = TRUE, stringsAsFactors = FALSE)


data <-cbind(rownames(data), data)
names(data)[1] <- "product1"

data.melt <- melt(data
             , id.vars = "product1"
             , measure.vars = colnames(data)[2:ncol(data)]
             , variable.name = "product2"
             , value.name = "similarity"
             ,factorsAsStrings = TRUE)

#check the output of melt, maybe the long format is suitable for your task    
data.melt

#if you split the data by your similarity and check the unique products
#in each list, you will see that they are all the same
data.split <- split(data.melt, data.melt$similarity)

lapply(data.split, function(x) {

  unique(unlist(x[, c("product1", "product2")]))


})

Answer 2

另一种方法可能是

#sample data (to understand this approach better I have slightly modified your input data)
mat <- Matrix(data = c(1,0,0,0,0,1,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1), nrow = 5, ncol = 5,
              dimnames = list(c("P1","P2","P3","P4","P5"),c("P1","P2","P3","P4","P5")),
              sparse = TRUE)
mat

#create dataframe having relationship among similar products
mat_summary <- summary(mat)
df <- data.frame(Product_Name = rownames(mat)[mat_summary$i],
                 Similar_Product_Name = colnames(mat)[mat_summary$j])
df <- df[df$Product_Name != df$Similar_Product_Name, ]
df

#clustering - to get the final result
library(igraph)
library(data.table)
df.g <- graph.data.frame(df)
final_df <- setNames(setDT(as.data.frame(clusters(df.g)$membership), keep.rownames = TRUE)[], c('Product', 'Product_Cluster'))
final_df

输出是：

   Product Product_Cluster
1:      P1               1
2:      P4               1
3:      P2               1
4:      P3               2
5:      P5               2

R搜索相似的稀疏矩阵

2 个答案: