RDD [Array [String]]

时间:2018-08-02 03:46:20

标签: scala apache-spark dataframe rdd

我正在尝试将数据帧转换为Spark中的RDD [Array [String]],目前正在这样做,我使用以下方法:

case class Array[String](c0:Long, c1:Integer, c2:Long, c3:String, c4:Integer, c5:Integer, c6:Integer)

val newData = df.distinct.map {
  case Row(c0:Long, c1:Integer, c2:Long, c3:String, c4:Integer, c5:Integer, c6:Integer) => Array[String](c0:Long, c1:Integer, c2:Long, c3:String, c4:Integer, c5:Integer, c6:Integer)
}

val newRDD = newData.rdd

这给了我似乎是从数据帧到RDD [Array [String]]的转换-但是,当我将其包装在函数中时;像这样:

 def caseNewRDD(df: DataFrame): RDD[Array[String]] ={
    case class Array[String](c0:Long, c1:Integer, c2:Long, c3:String, c4:Integer, c5:Integer, c6:Integer)
    val newData = df.distinct.map {
      case org.apache.spark.sql.Row(c0:Long, c1:Integer, c2:Long, c3:String, c4:Integer, c5:Integer, c6:Integer) => Array[String](c0:Long, c1:Integer, c2:Long, c3:String, c4:Integer, c5:Integer, c6:Integer)
    }
    val newRDD = newData.rdd
    newRDD
  }

我收到以下错误:

  

类型的表达   org.apache.spark.rdd.RDD [Array [scala.Predef.String]]不符合   预期类型   org.apache.spark.rdd.RDD [scala.Array [scala.Predef.String]]

我猜我正在生成的数组类型不正确,但是我不知道为什么。

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:2)

您不能像这样在Scala中强制转换类型。

case class Array[String](c0:Long, c1:Integer, c2:Long, c3:String, c4:Integer, c5:Integer, c6:Integer)

表示:创建类型为Array的类型为String NEW 。您要实现的目标是:

def caseNewRDD(df: DataFrame): RDD[Array[String]] = {
  df.distinct.map {
    case Row(c0:Long, c1:Integer, c2:Long, c3:String, c4:Integer, c5:Integer, c6:Integer) => 
      Array(c0.toString, c1.toString, c2.toString, c3, c4.toString, c5.toString, c6.toString)
  }.rdd
}

也就是说-我将类型显式转换为String,而没有实际创建新类型。