Spark mllib Collaborative Filtering,ValueError:RDD为空

时间:2016-04-24 18:42:46

标签: apache-spark pyspark apache-spark-mllib

我是Spark的新手,我正在从这里运行隐式协作装备mllib。当我在我的数据上运行以下代码时,我收到以下错误:

 ValueError: RDD is empty

这是我的数据:

 101,1000010,1
 101,1000011,1
 101,1000015,1
 101,1000017,1
 101,1000019,1
 102,1000010,1
 102,1000012,1
 102,1000019,1
 103,1000011,1
 103,1000012,1
 103,1000013,1
 103,1000014,1
 103,1000017,1
 104,1000010,1
 104,1000012,1
 104,1000013,1
 104,1000014,1
 104,1000015,1
 104,1000016,1
 104,1000017,1
 105,1000017,1

我的代码:

 from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
 data = sc.textFile("s3://xxxxxxxxxxxx.csv")

 ratings = data.map(lambda l: l.split(','))\
.map(lambda l: Rating(l[0], l[1], float(l[2])))

 # Build the recommendation model using Alternating Least Squares
 rank = 10
 numIterations = 10
 alpha = 0.01
 model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

 # Evaluate the model on training data
 testdata = ratings.map(lambda p: (p[0], p[1]))
 predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2]))
 ratesAndPreds = ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions)
 # convert pyspark pipeline to DF
 ratesAndPreds.toDF().show()

0 个答案:

没有答案