如何在Spark RDD中遍历数组

时间:2019-05-21 13:46:02

标签: scala apache-spark

如何在Spark RDD中遍历数组?

var dataResult: Array[Array[String]] = null

data1 = hiveContext.sql("select id, lon, lat  from table1").rdd.map(
  row=>(row.getAs[String]("id"), row.getAs[Double]("lon"), row.getAs[Double]("lat"))
).map(
  u=>Array(u._1,u._2.toString,u._3.toString)
).toArray()

val data2= hiveContext.sql("select id2,lon,lat from  table2").rdd.map(
      row=>(row.getAs[String]("id2"), row.getAs[Double]("lon"), row.getAs[Double]("lat"))
    )

var data3 = data2.map(u=>{
        for (i <- (0 until data1.length-1)){
          if(u._2 + 0.2 >= data1(i)(1).toDouble && u._2 - 0.2 <= data1(i)(1).toDouble && u._3 + 0.2 >= data1(i)(2).toDouble && u._3 - 0.2 <= data1(i)(2).toDouble){
          dataResult ++= ArrayBuffer(Array(u._1,u._2.toString,u._3.toString,data1(i)(0).toString,data1(i)(1).toString,data1(i)(2).toString))
          }
        }
        dataResult.toArray
    })

结果代码为

 Array[String] = Array([Ljava.lang.String;@c0ae5d5, [Ljava.lang.String;@c0ae5d5......

但是我想要

Array[String]=Array(Array(uid1,3,4),Array(uid2,4,5).....)

0 个答案:

没有答案