//For your request, I updated my code for value 'combinations'.
//val list = rdd looks like this -> org.apache.spark.rdd.RDD[(Int, Array[Int])]
val combinations = list.mapValues(_.toSeq.combinations(2).toArray.map{ case Seq(x,y) => (x,y)}).map(_._2)
val combinations :Array[Array[(Int, Int)]] = Array(Array((1953,1307), (1953,527), (1953,1272), (1953,1387), (1953,318)),Array(( ...))...)
val simOnly = combinations.foreach{ x =>
x.map{ case(item_1, item_2) =>
val itemFactor_1 = modelMLlib.productFeatures.lookup(item_1).head
val itemFactor_2 = modelMLlib.productFeatures.lookup(item_2).head
val itemVector_1 = new DoubleMatrix(itemFactor_1)
val itemVector_2 = new DoubleMatrix(itemFactor_2)
val sim = cosineSimilarity(itemVector_1,itemVector_2)
sim
}
}
这是我的代码,用于计算项目之间的余弦相似度。
但是嵌套的RDD并不支持apache spark
我该如何正确解决这个问题?
And my interpreter show this
//org.apache.spark.SparkException: This RDD lacks a SparkContext. It could happen in the following cases:
(1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
(2) When a Spark Streaming job recovers from checkpoint, this exception will be hit if a reference to an RDD not defined by the streaming job is used in DStream operations. For more information, See SPARK-13758.