在数据框的一行中创建结构字段

时间:2018-09-26 21:16:14

标签: scala apache-spark

我在下面的代码中尝试创建具有结构域的Spark DataFrame。我应该用???代替才能使它正常工作。

import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, Row, SparkSession}

val spark: SparkSession = SparkSession.builder()
  .appName("NodesLanesTest")
  .getOrCreate()
val someData = Seq(
  Row(1538161836000L, 1538075436000L, "cargo3", 3L, ???("Chicago", "1234"))
)
val someSchema = StructType(
  List(
    StructField("ata", LongType, nullable = false),
    StructField("atd", LongType, nullable = false),
    StructField("cargo", StringType, nullable = false),
    StructField("createdDate", LongType, nullable = false),
    StructField("destination",
      StructType(List(
        StructField("name", StringType, nullable = false),
        StructField("uuid", StringType, nullable = false)
      ))))
val someDF = spark.createDataFrame(
  spark.sparkContext.parallelize(someData),
  StructType(someSchema)
)

1 个答案:

答案 0 :(得分:1)

您缺少Row对象。从Row对象序列创建数据框时,StructType应该表示为Row对象,因此它必须对您有用:

   
val someData = Seq(
  Row(1538161836000L, 1538075436000L, "cargo3", 3L, Row("Chicago", "1234"))
)

希望有帮助。