Spark:创建嵌套模式

时间:2019-07-17 15:24:10

标签: apache-spark dataframe apache-spark-sql schema

有火花,

import spark.implicits._
val data = Seq(
  (1, ("value11", "value12")),
  (2, ("value21", "value22")),
  (3, ("value31", "value32"))
  )

 val df = data.toDF("id", "v1")
 df.printSchema()

结果如下:

root
|-- id: integer (nullable = false)
|-- v1: struct (nullable = true)
|    |-- _1: string (nullable = true)
|    |-- _2: string (nullable = true)

现在,如果我想自己创建模式,应该如何处理?

val schema = StructType(Array(
  StructField("id", IntegerType),
  StructField("nested", ???)
))

谢谢。

1 个答案:

答案 0 :(得分:2)

根据此处的示例: https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/types/StructType.html

 import org.apache.spark.sql._
 import org.apache.spark.sql.types._

 val innerStruct =
   StructType(
     StructField("f1", IntegerType, true) ::
     StructField("f2", LongType, false) ::
     StructField("f3", BooleanType, false) :: Nil)

 val struct = StructType(
   StructField("a", innerStruct, true) :: Nil)

 // Create a Row with the schema defined by struct
 val row = Row(Row(1, 2, true))

根据您的情况,它将是:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val schema = StructType(Array(
  StructField("id", IntegerType),
  StructField("nested", StructType(Array(
      StructField("value1", StringType),
      StructField("value2", StringType)
  )))
))

输出:

StructType(
  StructField(id,IntegerType,true), 
  StructField(nested,StructType(
    StructField(value1,StringType,true), 
    StructField(value2,StringType,true)
  ),true)
)