在Spark Scala中爆炸嵌套的df列

时间:2018-06-15 06:01:49

标签: scala apache-spark apache-spark-sql

列名称为'col1',格式为:

col1: array (nullable = true)
|     |-- A1: struct (containsNull = true)
|     |       |-- B0: struct (nullable = true)
|     |       |    |-- B01: string (nullable = true)
|     |       |    |-- B02: string (nullable = true)
|     |       |-- B1: string (nullable = true)
|     |       |-- B2: string (nullable = true)
|     |       |-- B3: string (nullable = true)
|     |       |-- B4: string (nullable = true)
|     |       |-- B5: string (nullable = true)

我首先尝试两件事来获取值B2。代码:

val explodeDF = test_df.explode($"col1") { case Row(col1_details:Array[String]) => 
  col1_details:Array.map{ col1_details:Array =>
    val firstName = col1_details:Array(2).asInstanceOf[String]
    val lastName = col1_details:Array(3).asInstanceOf[String]
    val email = col1_details:Array(4).asInstanceOf[String]
    val salary = col1_details:Array(5).asInstanceOf[String]
    notes_details(firstName, lastName, email, salary)
  }
}

错误:

error: too many arguments for method apply: (index: Int)Char in class StringOps
     col1_details(firstName, lastName, email, salary)

我尝试了各种代码段,但我遇到了不同的错误。关于这个错误会有什么帮助的任何建议。

0 个答案:

没有答案