在collect_list(struct(`FIELD`))中删除FIELD的重复值

时间:2018-07-18 06:49:31

标签: json apache-spark apache-spark-sql

我有一个超过300万条记录的表

我已经在表上运行了SQL查询,并且显示了该表的前10条记录

SQL查询:

SELECT top 10 ACCOUNTNO, VEHICLENUMBER, CUSTOMERID FROM [ISSUER].[HISTORY].[TP_CUSTOMER_PREPAIDACCOUNTS] GROUP BY ACCOUNTNO, VEHICLENUMBER, CUSTOMERID ORDER BY ACCOUNTNO 

ACCOUNTNO   VEHICLENUMBER   CUSTOMERID
10003014    MH43AJ411   20000000
10003014    MH43AJ411   20000001
10003015    MH12GZ3392  20000002
10003016    GJ15Z8173   20000003
10003018    MH05AM902   20000004
10003019    GJ15CB727   20000008
10003019    GJ15CD7387  20029961
10003019    GJ15CD7477  20001690
10003019    GJ15CD7657  20001866
10003019    MH02DG7774  20000933

我需要设计和导出JSON文件,它应该看起来像这样:

{
    "ACCOUNTNO":10003014,
    "VEHICLE": [
        { "VEHICLENUMBER":"MH43AJ411", "CUSTOMERID":20000000},
        { "VEHICLENUMBER":"MH43AJ411", "CUSTOMERID":20000001}
    ],
    "ACCOUNTNO":10003015,
    "VEHICLE": [
        { "VEHICLENUMBER":"MH12GZ3392", "CUSTOMERID":20000002}
    ]
}

我已在我的Spark程序中运行以下代码:

jdbcDF.registerTempTable("tp_customer_account")
val res00 = sqlContext.sql("SELECT ACCOUNTNO, collect_list(struct(`VEHICLENUMBER`, `CUSTOMERID`)) as VEHICLE FROM tp_customer_account GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO") 

res00.coalesce(1).write.json("D:/res06")

我得到的上述代码的结果:

{"ACCOUNTNO":10003014,"VEHICLE":[{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000000}]}
{"ACCOUNTNO":10003015,"VEHICLE":[{"VEHICLENUMBER":"MH12GZ3392","CUSTOMERID":20000002}]}
{"ACCOUNTNO":10003016,"VEHICLE":[{"VEHICLENUMBER":"GJ15Z8173","CUSTOMERID":20000003},{"VEHICLENUMBER":"GJ15Z8173","CUSTOMERID":20000003},{"VEHICLENUMBER":"GJ15Z8173","CUSTOMERID":20000003},{"VEHICLENUMBER":"GJ15Z8173","CUSTOMERID":20000003},{"VEHICLENUMBER":"GJ15Z8173","CUSTOMERID":20000003}]}
{"ACCOUNTNO":10003018,"VEHICLE":[{"VEHICLENUMBER":"MH05AM902","CUSTOMERID":20000004},{"VEHICLENUMBER":"MH05AM902","CUSTOMERID":20000004},{"VEHICLENUMBER":"MH05AM902","CUSTOMERID":20000004},{"VEHICLENUMBER":"MH05AM902","CUSTOMERID":20000004},{"VEHICLENUMBER":"MH05AM902","CUSTOMERID":20000004}]}
{"ACCOUNTNO":10003019,"VEHICLE":[{"VEHICLENUMBER":"GJ15CF7747","CUSTOMERID":20009020},{"VEHICLENUMBER":"GJ15CB9601","CUSTOMERID":20001557},{"VEHICLENUMBER":"GJ15CA7837","CUSTOMERID":20001223},{"VEHICLENUMBER":"MH02DG7774","CUSTOMERID":20000933},{"VEHICLENUMBER":"GJ15CD7387","CUSTOMERID":20029961},{"VEHICLENUMBER":"GJ15CF7747","CUSTOMERID":20009020},{"VEHICLENUMBER":"GJ15CB9601","CUSTOMERID":20001557},{"VEHICLENUMBER":"MH02DG7774","CUSTOMERID":20000933},{"VEHICLENUMBER":"GJ15CD7657","CUSTOMERID":20001866},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"GJ15CD7387","CUSTOMERID":20029961},{"VEHICLENUMBER":"GJ15CD7387","CUSTOMERID":20029961},{"VEHICLENUMBER":"GJ15CD7387","CUSTOMERID":20029961},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CF7747","CUSTOMERID":20009020},{"VEHICLENUMBER":"GJ15CF7747","CUSTOMERID":20009020},{"VEHICLENUMBER":"GJ15CD7657","CUSTOMERID":20001866},{"VEHICLENUMBER":"GJ15CD7657","CUSTOMERID":20001866},{"VEHICLENUMBER":"GJ15CB9601","CUSTOMERID":20001557},{"VEHICLENUMBER":"GJ15CB9601","CUSTOMERID":20001557},{"VEHICLENUMBER":"GJ15CB9601","CUSTOMERID":20001557},{"VEHICLENUMBER":"GJ15CB9601","CUSTOMERID":20001557},{"VEHICLENUMBER":"GJ15CB727","CUSTOMERID":20000008},{"VEHICLENUMBER":"GJ15CB727","CUSTOMERID":20000008},{"VEHICLENUMBER":"GJ15CB727","CUSTOMERID":20000008},{"VEHICLENUMBER":"GJ15CB727","CUSTOMERID":20000008},{"VEHICLENUMBER":"GJ15CB727","CUSTOMERID":20000008},{"VEHICLENUMBER":"GJ15CD7657","CUSTOMERID":20001866},{"VEHICLENUMBER":"GJ15CD7657","CUSTOMERID":20001866},{"VEHICLENUMBER":"GJ15CD7657","CUSTOMERID":20001866},{"VEHICLENUMBER":"GJ15CD7657","CUSTOMERID":20001866},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"GJ15CD7657","CUSTOMERID":20001866},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CD7657","CUSTOMERID":20001866},{"VEHICLENUMBER":"GJ15CA7387","CUSTOMERID":20001865},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CD7477","CUSTOMERID":20001690},{"VEHICLENUMBER":"GJ15CB9601","CUSTOMERID":20001557},{"VEHICLENUMBER":"GJ15CB9601","CUSTOMERID":20001557},{"VEHICLENUMBER":"GJ15CA7837","CUSTOMERID":20001223},{"VEHICLENUMBER":"GJ15CA7837","CUSTOMERID":20001223},{"VEHICLENUMBER":"MH02DG7774","CUSTOMERID":20000933},{"VEHICLENUMBER":"GJ15CB727","CUSTOMERID":20000008},{"VEHICLENUMBER":"MH02BY7774","CUSTOMERID":20000005}]}
{"ACCOUNTNO":10003020,"VEHICLE":[{"VEHICLENUMBER":"MH01AX5658","CUSTOMERID":20000006},{"VEHICLENUMBER":"MH01AX5658","CUSTOMERID":20000006},{"VEHICLENUMBER":"MH01AX5658","CUSTOMERID":20000006},{"VEHICLENUMBER":"MH01AX5658","CUSTOMERID":20000006},{"VEHICLENUMBER":"MH01AX5658","CUSTOMERID":20000006},{"VEHICLENUMBER":"MH01AX5658","CUSTOMERID":20000006},{"VEHICLENUMBER":"MH01AX5658","CUSTOMERID":20000006},{"VEHICLENUMBER":"MH01AX5658","CUSTOMERID":20000006}]}
{"ACCOUNTNO":10003021,"VEHICLE":[{"VEHICLENUMBER":"GJ15AD727","CUSTOMERID":20000007}]}

我们可以看到同一VEHICLENUMBER多次出现在列表中。
如何删除列表中的这些重复值? 请帮忙!谢谢你。

  

在输入表中:ACCOUNTNO是唯一的,相同的ACCOUNTNO可能具有   不止一个VEHICLENUMBER,对于每辆车,我们可能会有唯一的   CUSTOMERID关于VEHICLENUMBER

0 个答案:

没有答案