例如,student
列有一个StructType(("id", "int"), ("name", "string"))
| student |
| ------------|
| [123,james] |
如何使用Student
将一个列值转换为Dataset<Row>.map()
类实例?我应该将列的值视为字符串数组并解析它以构造实例吗?
答案 0 :(得分:1)
嵌套列值也是Row
。因此,我们可以按名称获取属性的值,然后构造一个实例。
这是一些演示代码:
scala> val df = Seq((1, "james"), (2, "tony")).toDF("id", "name")
df: org.apache.spark.sql.DataFrame = [id: int, name: string]
scala> val dd = df.select(struct("*").alias("students"))
dd: org.apache.spark.sql.DataFrame = [students: struct<id: int, name: string>]
scala> dd.show()
+--------------------+
| students|
+--------------------+
| [1,james] |
| [2,tony] |
+--------------------+
scala> rows(0).getStruct(0)
res9: org.apache.spark.sql.Row = [1,james]
我们可以看到,rows(0).getStruct(0)
返回的单元格值为Row
。
答案 1 :(得分:0)
使用编码器然后收集。
scala> val df = Seq((1, "james"), (2, "tony")).toDF("id", "name")
df: org.apache.spark.sql.DataFrame = [id: int, name: string]
scala> val dd = df.select(struct("*").alias("students"))
dd: org.apache.spark.sql.DataFrame = [students: struct<id: int, name: string>]
scala> dd.show()
+--------------------+
| students|
+--------------------+
| [1,james] |
| [2,tony] |
+--------------------+
scala> case class Student(id: Int, name: String)
defined class Student
scala> dd.select("students.*").as[Student].collectAsList
res6: java.util.List[Student] = [Student(1,james), Student(2,tony)]