我有多个json文件保持json数据初始化。 Json Structure看起来像这样。
root
|-- Age: long (nullable = true)
|-- Company: struct (nullable = true)
| |-- Company Name: string (nullable = true)
| |-- Domain: string (nullable = true)
|-- Designation: string (nullable = true)
|-- Email: string (nullable = true)
|-- Name: string (nullable = true)
|-- Test: array (nullable = true)
| |-- element: string (containsNull = true)
|-- location: struct (nullable = true)
| |-- City: struct (nullable = true)
| | |-- City Name: string (nullable = true)
| | |-- Pin: long (nullable = true)
| |-- State: string (nullable = true)
我试过这个
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+
|Age| Company| Designation| Email| Name| Test| location|
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+
| 22|[Elegant,Java]|Trainee Programmer|vpn2330@gmail.com|Vipin Suman|[Test1, Test2]|[[Ahmedabad,32400...|
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+
我正在为此获取架构
Age | Company Name | Domain| Designation | Email | Name | Test | City Name | Pin | State |
22 | Elegant MicroWeb | Java | Programmer | vpn2330@gmail.com | Vipin Suman | Test1 | Ahmedabad | 324009 | Gujarat
22 | Elegant MicroWeb | Java | Programmer | vpn2330@gmail.com | Vipin Suman | Test2 | Ahmedabad | 324009 |
我正在查看表格: -
users
$uid
displayName: ""
type: ""
contacts
$uid
$contactUid: true
我想要结果为: -
$uid
我怎样才能获得上面的表格。我试了一切。我是apache spark的新手可以帮助我吗?
答案 0 :(得分:0)
我建议你在scala中做你的工作,这更好地受到spark的支持。为了完成你的工作,你可以使用"选择"用于选择特定列的API,使用别名重命名列,您可以参考此处说明如何选择复杂数据格式(https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html)
根据您的结果,您还需要使用" explode" API(Flattening Rows in Spark)
答案 1 :(得分:0)
在Scala中可以这样做:
people.select(
$"Age",
$"Company.*",
$"Designation",
$"Email",
$"Name",
explode($"Test"),
$"location.City.*",
$"location.State")
不幸的是,在Java中使用以下代码会失败:
people.select(
people.col("Age"),
people.col("Company.*"),
people.col("Designation"),
people.col("Email"),
people.col("Name"),
explode(people.col("Test")),
people.col("location.City.*"),
people.col("location.State"));
您可以使用selectExpr
代替:
people.selectExpr(
"Age",
"Company.*",
"Designation",
"Email",
"Name",
"EXPLODE(Test) AS Test",
"location.City.*",
"location.State");
<强> PS:强>
您可以将路径传递给目录,而不是sparkSession.read().json(jsonFiles);
中的JSON文件列表。