我需要从sql查询中获取属性/列沿袭。
基本上,我的意思是沿袭是“输出与输入”列中的关系。让我们看看下面的示例。
我有两个桌子
餐桌顺序
OrderID CustomerID OrderDate
10308 2 9/18/1996
10309 37 9/19/1996
10310 77 9/20/1996
表客户
CustomerID CustomerName ContactName Country
1 Riya Maria Anders Germany
2 John Ana Trujillo Mexico
3 Anthany Antonio Moreno Mexico
我使用sparksession将两个表都读取为DataFrame并转换为tempView
var oDF = sparkSession.
read.
format("csv").
option("header", EdlConstants.TRUE).
option("inferschema", EdlConstants.TRUE).
option("delimiter", ",").
option("decoding", EdlConstants.UTF8).
option("multiline", true).
load("C:\\Users\\tneja\\Trunk\\scripts\\el\\el-dt\\src\\test\\resources\\files\\order.csv")
println("smaple data in oDF====>")
oDF.show()
var cusDF = sparkSession.
read.
format("csv").
option("header", EdlConstants.TRUE).
option("inferschema", EdlConstants.TRUE).
option("delimiter", ",").
option("decoding", EdlConstants.UTF8).
option("multiline", true).
load("C:\\Users\\tneja\\Trunk\\scripts\\el\\el-dt\\src\\test\\resources\\files\\customer.csv")
println("smaple data in cusDF====>")
cusDF.show()
oDF.createOrReplaceTempView("orderTempView")
cusDF.createOrReplaceTempView("customerTempView")
现在,我要同时连接表和在其上编写查询,并在select中提供别名,如下所示
val res = sqlContext.sql("select OID as OID_new, CID as CID_new from (select ot.OrderID as OID,ct.CustomerID as CID,ot.OrderID+ct.CustomerName as MID from orderTempView as ot inner join customerTempView as ct on ot.CustomerID = ct.CustomerID)")
val analyzedPlan = res.queryExecution.analyzed
分析计划如下所示。
Project [OID#36 AS OID_new#39, CID#37 AS CID_new#40]
+- Project [OrderID#0 AS OID#36, CustomerID#15 AS CID#37, (cast(OrderID#0 as
double) + cast(CustomerName#16 as double)) AS MID#38]
+- Join Inner, (CustomerID#1 = CustomerID#15)
:- SubqueryAlias ot
: +- SubqueryAlias ordertempview
: +- Relation[OrderID#0,CustomerID#1,OrderDate#2] csv
+- SubqueryAlias ct
+- SubqueryAlias customertempview
+- Relation[CustomerID#15,CustomerName#16,ContactName#17,Country#18] csv
似乎这种逻辑计划是树结构。
所以我需要提取输出和输入的属性值。
所以我需要在Map中获取如下所示的值以获取关系
Map("OID_NEW" -> "orderTempView.OrderID", "CID_NEW"->"customerTempView.CustomerID")
其中OID_NEW是输出列,orderTempView.OrderID是它的输入列,与其他列相同。
问题在于如何遍历此逻辑计划(analyzedPlan)。如何以有用的方式从中提取数据?
如果有人可以帮助我从analyticsPlan获得类似这样的输出,那将是非常感谢。 谢谢!