Scala DataFrame: Explode an array

时间:2015-06-30 13:42:09

标签: scala dataframe explode apache-spark-sql

I am using the spark libraries in Scala. I have created a DataFrame using

val searchArr = Array(
  StructField("log",IntegerType,true),
  StructField("user", StructType(Array(
    StructField("date",StringType,true),
    StructField("ua",StringType,true),
    StructField("ui",LongType,true))),true),
  StructField("what",StructType(Array(
    StructField("q1",ArrayType(IntegerType, true),true),
    StructField("q2",ArrayType(IntegerType, true),true),
    StructField("sid",StringType,true),
    StructField("url",StringType,true))),true),
  StructField("where",StructType(Array(
    StructField("o1",IntegerType,true),
    StructField("o2",IntegerType,true))),true)
)

val searchSt = new StructType(searchArr)    

val searchData = sqlContext.jsonFile(searchPath, searchSt)

I am now what to explode the field what.q1, which should contain an array of integers, but the documentation is limited: http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/DataFrame.html#explode(java.lang.String,%20java.lang.String,%20scala.Function1,%20scala.reflect.api.TypeTags.TypeTag)

So far I tried a few things without much luck

val searchSplit = searchData.explode("q1", "rb")(q1 => q1.getList[Int](0).toArray())

Any ideas/examples of how to use explode on an array?

1 个答案:

答案 0 :(得分:0)

您是否在字段"" 上尝试使用UDF?这样的事情可能很有用:

val explode = udf {
(aStr: GenericRowWithSchema) => 
  aStr match {
      case null => ""
      case _  =>  aStr.getList(0).get(0).toString()
  }
}


val newDF = df.withColumn("newColumn", explode(col("what")))

其中:

  • getList(0)返回" q1"字段
  • get(0)返回" q1"
  • 的第一个元素

我不确定但您可以尝试使用 getAs [T](fieldName:String)而不是 getList(index:Int)