Question

我是Spark的新手，我正在从这里运行隐式协作装备mllib。当我在我的数据上运行以下代码时，我收到以下错误：

 ValueError: RDD is empty

这是我的数据：

 101,1000010,1
 101,1000011,1
 101,1000015,1
 101,1000017,1
 101,1000019,1
 102,1000010,1
 102,1000012,1
 102,1000019,1
 103,1000011,1
 103,1000012,1
 103,1000013,1
 103,1000014,1
 103,1000017,1
 104,1000010,1
 104,1000012,1
 104,1000013,1
 104,1000014,1
 104,1000015,1
 104,1000016,1
 104,1000017,1
 105,1000017,1

我的代码：

 from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
 data = sc.textFile("s3://xxxxxxxxxxxx.csv")

 ratings = data.map(lambda l: l.split(','))\
.map(lambda l: Rating(l[0], l[1], float(l[2])))

 # Build the recommendation model using Alternating Least Squares
 rank = 10
 numIterations = 10
 alpha = 0.01
 model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

 # Evaluate the model on training data
 testdata = ratings.map(lambda p: (p[0], p[1]))
 predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2]))
 ratesAndPreds = ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions)
 # convert pyspark pipeline to DF
 ratesAndPreds.toDF().show()

Spark mllib Collaborative Filtering，ValueError：RDD为空

0 个答案: