如何在Spark RDD中遍历数组?
var dataResult: Array[Array[String]] = null
data1 = hiveContext.sql("select id, lon, lat from table1").rdd.map(
row=>(row.getAs[String]("id"), row.getAs[Double]("lon"), row.getAs[Double]("lat"))
).map(
u=>Array(u._1,u._2.toString,u._3.toString)
).toArray()
val data2= hiveContext.sql("select id2,lon,lat from table2").rdd.map(
row=>(row.getAs[String]("id2"), row.getAs[Double]("lon"), row.getAs[Double]("lat"))
)
var data3 = data2.map(u=>{
for (i <- (0 until data1.length-1)){
if(u._2 + 0.2 >= data1(i)(1).toDouble && u._2 - 0.2 <= data1(i)(1).toDouble && u._3 + 0.2 >= data1(i)(2).toDouble && u._3 - 0.2 <= data1(i)(2).toDouble){
dataResult ++= ArrayBuffer(Array(u._1,u._2.toString,u._3.toString,data1(i)(0).toString,data1(i)(1).toString,data1(i)(2).toString))
}
}
dataResult.toArray
})
结果代码为
Array[String] = Array([Ljava.lang.String;@c0ae5d5, [Ljava.lang.String;@c0ae5d5......
但是我想要
Array[String]=Array(Array(uid1,3,4),Array(uid2,4,5).....)