我是R的新手,实际上我已经挖掘了movielens-M数据以提供推荐系统,但是当我在下面运行我的代码时,我遇到了一些错误:
> ##read the rating data for all users
> readData<-function(){
+ ratingDF <- read.delim("ratings.dat", sep=':',header=F)
+ ratingDF <- subset(ratingDF, select = c("V1","V3","V5","V7"))
+ names(ratingDF) <- c("userID","movieID","rating","timestamp")
+
+ moviesDF <- readLines("movies.dat")
+ moviesDF <- as.data.frame(do.call("rbind",strsplit(moviesDF,"::")),stringsAsFactors = FALSE)
+ names(moviesDF) <- c("movieID","Title","Genre")
+
+ return(list(ratingDF=ratingDF, movieDF=moviesDF))
+ }
>
>
> ##data cleansing and processing
> preProcess= function(ratingDF,moviesDF){
+
+ ratingDF[,2] <- dataList$movieDF$Title[as.numeric(ratingDF[,2])]
+ #remove duplicate
+ ratingDF <- ratingDF[!duplicated(ratingDF[,1:2]),]
+ }
>
>
> createRatingMatrix <- function(ratingDF){
+
+ #converting the ratingData data frame into rating matrix
+ ratingDF_tmp <- dcast(ratingDF, userID ~ movieID, value.var = "rating", index ="userID")
+ ratingDF <- ratingDF_tmp[,2:ncol(ratingDF_tmp)]
+
+ ratingMat<-as(ratingDF,"matrix")
+ movieRatingMat<-as(ratingMat,"realRatingMatrix")
+
+ #setting up the dimnames
+ dimnames(movieRatingMat)[[1]]<-row.names(ratingDF)
+ return(movieRatingMat)
+
+ }
>
> #create recommender model
> evaluateModels<-function(movieRatingMat){
+
+ #find out and anlyze available recommendation algorithm option for realRatingMatrix data
+ recommenderRegistry$get_entries(dataType="realRatingMatrix")
+ scheme <- evaluationScheme(movieRatingMat, method="split", train=.9, k=1, given=10, goodRating=4)
+ algorithms<-list(
+ RANDOM = list(name="RANDOM", param=NULL),
+ POPULAR = list(name="POPULAR", param=NULL),
+ UBCF = list(name="UBCF", param=NULL),
+ IBCF = list(name="IBCF", param=NULL)
+ )
+
+ #run algorithms, predict next n movie
+ results<-evaluate(scheme, algorithms, n=c(1,3,5,10,15,20))
+
+ #select the first results
+ return(results)
+
+ }
>
>
> ##load movie lens data
> dataList<-readData()
>
> ratingDF<-preProcess(dataList$ratingDF, dataList$movieDF)
>
>
> movieRatingMat<-createRatingMatrix(ratingDF)
>
> evalList<-evaluateModels(movieRatingMat)
Show Traceback
Rerun with Debug
Error in .local(data, ...) : Some observations have size<given!
>
在这里,我已经知道了这个问题,这是因为given
参数,但我不知道它为什么会发生。
答案 0 :(得分:2)
在推荐实验室中,given
参数是为评估而给出的单个项目数,或者是给出每个观察项目给定项目数的数据长度向量。那么为什么你遇到错误是因为你的数据的矢量大小小于给定大小的矢量。
答案 1 :(得分:0)
@Kavipriya:您的data = zip(*data[::-1])
矩阵中的某些用户(即某些行)可能对不到10部电影的评分为4或6,在这种情况下,您要询问movieRatingMat
无论如何都无法使用10个评分,因为它们缺少正确的评分。解决方法是为recommenderlab
使用一个较低的值,或者输入具有足够数量的每个用户分级的矩阵。如果您不知道其背后的数学运算法则:包将计算2个向量之间的距离,并且两个向量的长度必须相同。希望有帮助。