如何使用一个或多个StructType创建模式(StructType)?

时间:2017-10-04 15:17:14

标签: scala apache-spark apache-spark-sql

我正在尝试在另一个StructType内创建StructType,但它只允许添加StructField。我无法找到任何方法来添加StructType

如何为以下字符串表示创建StructType架构?

struct<abc:struct<name:string>,pqr:struct<address:string>>

2 个答案:

答案 0 :(得分:5)

Spark SQL的这个隐藏功能是使用所谓的 Schema DSL 来定义架构(即没有多个圆括号等)。

import org.apache.spark.sql.types._
val name = new StructType().add($"name".string)
scala> println(name.simpleString)
struct<name:string>

val address = new StructType().add($"address".string)
scala> println(address.simpleString)
struct<address:string>

val schema = new StructType().add("abc", name).add("pqr", address)
scala> println(schema.simpleString)
struct<abc:struct<name:string>,pqr:struct<address:string>>

scala> schema.simpleString == "struct<abc:struct<name:string>,pqr:struct<address:string>>"
res4: Boolean = true

scala> schema.printTreeString
root
 |-- abc: struct (nullable = true)
 |    |-- name: string (nullable = true)
 |-- pqr: struct (nullable = true)
 |    |-- address: string (nullable = true)

答案 1 :(得分:3)

structField是类型和名称的组合,因此您可以这样做:

StructType(Seq(StructField("structName", StructType(Seq(StructField("name", StringType), StructField("address", StringType))))