我的数据集ds
具有以下架构:
root
|-- id: string (nullable = true)
|-- type: string (nullable = true)
|-- item: struct (nullable = true)
| |-- item: string (nullable = true)
示例:
{"id":"1","type": "aaa", "item": {"item":"11"}}
{"id":"2","type": "bbb", "item" : {"item":"12"}}
如何从结构中检索item
以获得此结果?
id type item
1 aaa 11
2 bbb 12
这是我尝试失败的尝试:
ds.select("id", "type", "item.0");
请注意,我使用Java。除非答案与Java相同,否则请勿在Scala或Python中发布答案。
答案 0 :(得分:2)
假设您有示例文件:
{"id":"1","type": "aaa", "item": {"item":"11"}}
{"id":"2","type": "bbb", "item" : {"item":"12"}}
您可以测试以下Java代码:
public class SparkJavaTest {
public static SparkSession spark = SparkSession
.builder()
.appName("JavaSparkTest")
.master("local")
.getOrCreate();
public static void main(String[] args) {
Dataset<Row> ds1 = spark.read().json("sample.json");
ds1.printSchema();
ds1.select("id", "type", "item.item").show(false);
结果将是:
root
|-- id: string (nullable = true)
|-- item: struct (nullable = true)
| |-- item: string (nullable = true)
|-- type: string (nullable = true)
+---+----+----+
|id |type|item|
+---+----+----+
|1 |aaa |11 |
|2 |bbb |12 |
+---+----+----+