我有一个包含200万个字段的表,该表已在Spark Dataframe中注册。
表格如下:
<div class="content">
<div class="c-dropdown text-right">
<span class="c-dropbtn mr-2">fs</span>
<div class="c-dropdown-content">
<div class="c-dropdown-item">View</div>
<div class="c-dropdown-item">Delete</div>
</div>
</div>
</div>
<div class="content">
<div class="c-dropdown text-right">
<span class="c-dropbtn mr-2">fs</span>
<div class="c-dropdown-content">
<div class="c-dropdown-item">View</div>
<div class="c-dropdown-item">Delete</div>
</div>
</div>
</div>
我想转换为JSON,并且JSON文件应该看起来像这样,很抱歉,我已经手工设计了:
CUSTADDRESSID ADDRESSTYPE ADDRESSLINE1 ADDRESSLINE2 ADDRESSLINE3 CITY STATE COUNTRY ZIP1 ISACTIVE ISCOMMUNICATION CREATEDDATE CREATEDUSER UPDATEDDATE UPDATEDUSER REASONCODE ZIP2 C_ACCNO CUSTOMERID ACCOUNTGROUPID PREPAIDACCOUNTSTATUSID PREPAIDACCOUNTSTATUSDATE SOURCEOFENTRY REVENUECATEGORYID VEHICLENUMBER VEHICLECLASS SERIALNO HEXTAGID TAGSTATUS TAGSTARTEFFDATE TAGENDEFFDATE ISTAGBLACKLISTED ISBLACKLISTHOLD RCVERIFICATIONSTATUS EMAILADDRESS PHONENUMBER CCreatedDate CCreatedUser CUpdatedDate CUpdatedUser HISTID ACTION ISFEEWAIVER FEEWAIVERPASSTYPE VEHICLEIMGVERIFICATIONSTATUS TAGTID ISREVENUERECHARGE RowNumber
41 Mailing B309 PROGRESSIVE SIGNATURE SECTOR-6 GHANSOLI NAVI MUMBAI MH IND 400701 1 1 2013-06-07 12:55:54.827 bhagwadapos 2013-06-07 12:55:54.827 bhagwadapos NULL NULL 10003014 20000001 15 3079 2015-09-16 14:58:27.500 RegularRetailer 75 MH43AJ411 4 206158433290 91890704803000000C0A TAGINACTIVE 2014-08-08 14:24:12.227 2039-08-08 23:59:59.000 1 0 NULL shankarn75@rediffmail.com 9004419178 2013-06-07 12:56:16.650 bhagwadapos 2015-09-16 14:58:33.190 BatchProcess 15250 UPDATE NULL NULL NULL NULL NULL 1
我已经编写了Spark SQL查询,但是无法在ACCOUNTNO : 10003018
ADDRESS : Array
0 : Object
VEHICLE : Array
0 : Object
下创建两个数组VEHICLE
和ADDRESS
这是查询:
ACCOUNTNO
及其后:
val query2 = "SELECT C_ACCNO AS ACCOUNTNO, collect_set(struct(VEHICLENUMBER, CUSTOMERID,ACCOUNTGROUPID,PREPAIDACCOUNTSTATUSID,PREPAIDACCOUNTSTATUSDATE,SOURCEOFENTRY,REVENUECATEGORYID,VEHICLECLASS,SERIALNO,HEXTAGID,TAGSTATUS,TAGSTARTEFFDATE,TAGENDEFFDATE,ISTAGBLACKLISTED,ISBLACKLISTHOLD,RCVERIFICATIONSTATUS,EMAILADDRESS,PHONENUMBER,CREATEDDATE,CREATEDUSER,UPDATEDDATE,UPDATEDUSER,ISFEEWAIVER,FEEWAIVERPASSTYPE,VEHICLEIMGVERIFICATIONSTATUS,TAGTID,ISREVENUERECHARGE)) as VEHICLE FROM joined_acc_add GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO"
我需要帮助才能在查询中找到我的错误。该查询抛出错误。
答案 0 :(得分:1)
我做完了。在一个对象下创建多个数组的查询应该是这样的。
val query1 = "SELECT PKcolumn1, collect_set(struct(column2, column3)), collect_set(struct(column4, column5)), collect_set(struct(column6, column7)) GROUP BY PKcolumn1 ORDER BY PKcolumn1"