根据另一个RDD从RDD获得特定值

时间:2016-01-30 06:39:53

标签: scala apache-spark rdd

我想通过以下代码查找另一个RDD来映射RDD:

val product = numOfT.map{case((a,b),c)=>
 val h = keyValueRecords.lookup(b).take(1).mkString.toInt
 (a,(h*c))
}

a,b是字符串,c是整数。 keyValueRecords是这样的:RDD [(string,string)] - 我有类型错配错误:我该如何解决? 我的错是什么?

这是一个数据样本:

userId,movieId,rating,timestamp
1,16,4.0,1217897793
1,24,1.5,1217895807
1,32,4.0,1217896246
2,3,2.0,859046959
3,7,3.0,8414840873

我正在通过这段代码来嘲笑:

val lines = sc.textFile("ratings.txt").map(s => {
  val substrings = s.split(",")
  (substrings(0), (substrings(1),substrings(1)))
})

val shoppingList = lines.groupByKey()

val coOccurence = shoppingList.flatMap{case(k,v) => 
  val arry1 = v.toArray
  val arry2 = v.toArray

val pairs = for (pair1 <- arry1; pair2 <- arry2 ) yield ((pair1,pair2),1)
 pairs.iterator
   }          

 val numOfT = coOccurence.reduceByKey((a,b)=>(a+b)) // (((item,rate),(item,rate)),coccurence)

  // produce recommend for an especial user 


  val keyValueRecords = sc.textFile("ratings.txt").map(s => {
  val substrings = s.split(",")
  (substrings(0), (substrings(1),substrings(2)))
}).filter{case(k,v)=> k=="1"}.groupByKey().flatMap{case(k,v) => 
  val arry1 = v.toArray
  val arry2 = v.toArray

val pairs = for (pair1 <- arry1; pair2 <- arry2 ) yield ((pair1,pair2),1)
 pairs.iterator
   }   
 val numOfTForaUser = keyValueRecords.reduceByKey((a,b)=>(a+b))

 val joined = numOfT.join(numOfTForaUser).map{case(k,v)=>(k._1._1,(k._2._2.toFloat*v._1.toFloat))}.collect.foreach(println)

最后的RDD不会产生。这是错的吗?

0 个答案:

没有答案