我有一个像这样的DataFrame:
TIdMessageHelper
使用此代码我试图将其转换为以下内容:
root
|-- midx: double (nullable = true)
|-- future: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _1: long (nullable = false)
| | |-- _2: long (nullable = false)
因此,计划是将(事件,未来)数组转换为将这两个字段作为列的数据框。我试图将T转换为这样的DataFrame:
val T = withFfutures.where($"midx" === 47.0).select("midx","future").collect().map((row: Row) =>
Row {
row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
(row.getAs[Double]("midx"), e, f)
}
}
).toList
root
|-- id: double (nullable = true)
|-- event: long (nullable = true)
|-- future: long (nullable = true)
但是当我这样做时,试图调查val schema = StructType(Seq(
StructField("id", DoubleType, nullable = true)
, StructField("event", LongType, nullable = true)
, StructField("future", LongType, nullable = true)
))
val df = sqlContext.createDataFrame(context.parallelize(T), schema)
我得到了这个错误:
df
答案 0 :(得分:0)
过了一会儿,我发现了问题所在:首先,列中的结构数组应该被转换为Row。因此,构建最终数据框的最终代码应如下所示:
val T = withFfutures.select("midx","future").collect().flatMap( (row: Row) =>
row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
(row.getAs[Double]("midx") , e, f)
}.toList
).toList
val all = context.parallelize(T).toDF("id","event","future")