创建嵌套的结构模式SPARK

时间:2019-03-01 06:58:15

标签: apache-spark apache-spark-sql

DF的现有列:

|-- col1: string (nullable = true)
|-- col2: string (nullable = true)
|-- col3: struct (nullable = true)
|    |-- col3_1: struct (nullable = true)
|    |    |-- colA: string (nullable = true)
|    |-- col3_2: struct (nullable = true)
|    |    |-- colB: string (nullable = true)
|-- col4: string (nullable = true)
|-- col5: string (nullable = true)

我只需要阅读以下列:

col1,col2, col3,

对于前两列,我可以创建以下架构:

val schema = StructType(Array(StructField("col1", StringType), StructField("col2", LongType)))

嵌套结构的模式:

StructType(Array(StructField("col1", StringType), 
StructField("col3", StructType(StructField("col3_1",StructType(StructField("colA",StringType))),StructField("col3_2",StructType(StructField("colB",StringType)))))

错误:

error: overloaded method value apply with alternatives:

任何为嵌套结构创建架构的建议

1 个答案:

答案 0 :(得分:0)

您应该尝试类似的操作或为col3声明一个case class并将其替换为您的架构:

val schema = StructType(Seq(  
    StructField("col1",IntegerType,false),
    StructField("col2",StringType,false),
    StructField("col3",StructType(Seq(  
                       StructField("col3_1",StructType(Seq(  
                       StructField("colA",StringType,false)
                         ))),
                       StructField("col3_2",StructType(Seq(  
                       StructField("colB",StringType,false)
                         )))