我有一个包含产品相似性数据的稀疏出现矩阵。 所有产品在x和y上以相同的顺序出现,值为1表示产品是相同的,无论值为0表示产品是否不同。
如下:
P1 P2 P3 P4
P1 1 1 0 0
P2 0 1 0 1
P3 0 0 1 1
P4 0 1 0 1
在这种情况下,P1类似于自身和P2,但P2类似于P4。所以最后P1,P2和P4都是一样的。 我需要在R中写一些能为P1,P2和P4分配相同代码的东西:
Product_Name Ref_Code
P1 P1
P2 P1
P3 P3
P4 P1
是否可以在R?
中进行干杯,
的Dario。
答案 0 :(得分:1)
我同意@Prem,根据您的逻辑,所有产品都是相同的。我已经使用reshape2
包提供了一个代码示例,以便将您的产品放入长格式。即使您的相似性度量不会在产品之间产生任何差异,您也可以使用melt()
的输出,以不同的方式对数据进行排序和过滤,从而达到您想要的效果。
library(reshape2)
data <- read.table ( text = "P1 P2 P3 P4
P1 1 1 0 0
P2 0 1 0 1
P3 0 0 1 1
P4 0 1 0 1"
, header = TRUE, stringsAsFactors = FALSE)
data <-cbind(rownames(data), data)
names(data)[1] <- "product1"
data.melt <- melt(data
, id.vars = "product1"
, measure.vars = colnames(data)[2:ncol(data)]
, variable.name = "product2"
, value.name = "similarity"
,factorsAsStrings = TRUE)
#check the output of melt, maybe the long format is suitable for your task
data.melt
#if you split the data by your similarity and check the unique products
#in each list, you will see that they are all the same
data.split <- split(data.melt, data.melt$similarity)
lapply(data.split, function(x) {
unique(unlist(x[, c("product1", "product2")]))
})
答案 1 :(得分:0)
另一种方法可能是
#sample data (to understand this approach better I have slightly modified your input data)
mat <- Matrix(data = c(1,0,0,0,0,1,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1), nrow = 5, ncol = 5,
dimnames = list(c("P1","P2","P3","P4","P5"),c("P1","P2","P3","P4","P5")),
sparse = TRUE)
mat
#create dataframe having relationship among similar products
mat_summary <- summary(mat)
df <- data.frame(Product_Name = rownames(mat)[mat_summary$i],
Similar_Product_Name = colnames(mat)[mat_summary$j])
df <- df[df$Product_Name != df$Similar_Product_Name, ]
df
#clustering - to get the final result
library(igraph)
library(data.table)
df.g <- graph.data.frame(df)
final_df <- setNames(setDT(as.data.frame(clusters(df.g)$membership), keep.rownames = TRUE)[], c('Product', 'Product_Cluster'))
final_df
输出是:
Product Product_Cluster
1: P1 1
2: P4 1
3: P2 1
4: P3 2
5: P5 2