目标是用评级数据集中的计数计算填充上三角矩阵。通过查找正确的索引来计算和存储每个值。它不是按顺序存储的。下面的R代码工作正常,但是对于大型数据集需要花费太多时间。
ratings <- read.csv("ratings.csv", header=TRUE, sep=",")
>> head(ratings)
userId movieId rating timestamp
1 1 16 4.0 1217897793
2 1 24 1.5 1217895807
3 1 32 4.0 1217896246
4 1 47 4.0 1217896556
5 1 50 4.0 1217896523
6 1 110 4.0 1217896150
no_nodes <- nrow(movies)*2
temp <- movies$movieId
nodes_name <- c(paste(temp,"-L",sep=""),paste(temp,"-D",sep=""))
ac_graph <- matrix(NA,nrow=length(nodes_name),ncol=length(nodes_name),dimnames = list(nodes_name,nodes_name))
for(i in 1:nrow(movies)){
for(j in (i+1):nrow(movies)){
ac_graph[which(nodes_name==paste(i,"-L",sep="")),which(nodes_name==paste(j,"-L",sep=""))] <- length(intersect(ratings[ratings$movieId==i&ratings$rating>2.5,1],ratings[ratings$movieId==j&ratings$rating>2.5,1]))
ac_graph[which(nodes_name==paste(i,"-D",sep="")),which(nodes_name==paste(j,"-D",sep=""))] <- length(intersect(ratings[ratings$movieId==i&ratings$rating<=2.5,1],ratings[ratings$movieId==j&ratings$rating<=2.5,1]))
}
}
是否可以使用apply,sapply,outer或某些函数来执行相同操作?