Question

我想在这个结构中创建一个架构：

|    |-- Features: struct (nullable = true)
|    |    |-- Feature: array (nullable = true)
|    |    |    |-- element: string (containsNull = true)

这是我的代码：

StructField( "Features", StructType(
        Array(
          StructField( "Feature", ArrayType(
            StructType(
              Array(
                StructField( "element", StringType, true )
              )
            )
          ) )
        )
      ), true )

结果：

|    |-- Features: struct (nullable = true)
|    |    |-- Feature: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- element: string (nullable = true)

任何想法？

Answer 1

您应该省略最里面的struct：

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val schema = StructType(Seq(StructField("Features", StructType(Seq(
  StructField("Feature", ArrayType(StringType))
)))))

spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema).printSchema
// root
//  |-- Features: struct (nullable = true)
//  |    |-- Feature: array (nullable = true)
//  |    |    |-- element: string (containsNull = true)

使用Spark加载XML时推断架构的重复字段

1 个答案: