如何在Vertica中消除重复并将列值合并为单个文本

时间:2018-09-13 16:20:10

标签: sql vertica

我正在尝试连接三个表并获取结果,但是,其中一个表具有针对同一CSO_Item_key的多个event_code,这会导致记录重复。 请注意,我的来源是Vertica,目标是SQL Server。 我尝试了一些东西,并尝试了XML方法,但无法在vertica上使用;它说不正确的语法XML。 还有其他解决方法

表1

Entry Date      Cso Item Key    Fail Code
8/1/2018 4:28   BLXB796201      CSL120
8/1/2018 4:40   BLXB799101      CLL250
8/1/2018 4:55   BLXB803001      CMS130
8/1/2018 5:08   BLXB806201      CNE100

表2

Cso Item Key    Event Code
BLXB796201      GTS
BLXB796201      LC28
BLXB796201      SDR4
BLXB799101      GTS
BLXB799101      LC28
BLXB799101      SDR4
BLXB803001      GTS
BLXB803001      LC28
BLXB803001      SDR4
BLXB806201      GTS
BLXB806201      LC28
BLXB806201      SDR4

表3

Fail Code  Desc
CSL120     Bad Part
CLL250     Unit Scrapped
CNE100     OS Reinstall
CBN101     NTF

预期结果:

Entry_Date     Cso_Item_Key Fail_Code   Desc         Event_Code
8/1/2018 4:28   BLXB796201   CSL120   Bad Part       GTS,LC28,SDR4
8/1/2018 4:40   BLXB799101   CLL250   Unit Scrapped  GTS,LC28,SDR4
8/1/2018 4:55   BLXB803001   CMS130   Null           GTS,LC28,SDR4
8/1/2018 5:08   BLXB806201   CNE100   OS Reinstall   GTS,LC28,SDR4

数据截图:

enter image description here

2 个答案:

答案 0 :(得分:0)

为此,我看到的唯一解决方案之一是strings_package扩展名,可以在github上找到here。有了它,您可以像这样使用group_concat函数:

-- get a list of nodes
select group_concat(node_name) over () from nodes;

-- nodes with storage for a projection
select schema_name,projection_name,
group_concat(node_name) over (partition by schema_name,projection_name) 
from (select distinct node_name,schema_name,projection_name from storage_containers) sc order by schema_name, projection_name;

答案 1 :(得分:0)

这是试图在SQL中完成所有操作-有点作弊,因为我依靠的事实是Table_2始终为每个CSO Item Key具有3个不同的事件代码。

如果不是这种情况,则必须在我作为公用表表达式创建的i索引表中添加几行-每个CSO项密钥的最大事件代码数,则必须将i表左联接到tb2,并在表达式中添加一些NULL处理逻辑,例如:||','||MAX(CASE i.i WHEN 2 THEN event_code END),以便在连接时将空字符串连接起来。表达式中的event_code为NULL。

但是,否则-使用您的输入(当您真正使用它时应该从查询中删除它),看起来可能像这样:

WITH
-- your input, don't use in real query ...
tb1(Entry_Date,Cso_Item_Key,Fail_Code) AS (
          SELECT TIMESTAMP '8/1/2018 4:28','BLXB796201','CSL120'
UNION ALL SELECT TIMESTAMP '8/1/2018 4:40','BLXB799101','CLL250'
UNION ALL SELECT TIMESTAMP '8/1/2018 4:55','BLXB803001','CMS130'
UNION ALL SELECT TIMESTAMP '8/1/2018 5:08','BLXB806201','CNE100'
)
,
tb2(Cso_Item_Key,Event_Code) AS (
          SELECT 'BLXB796201','GTS'
UNION ALL SELECT 'BLXB796201','LC28'
UNION ALL SELECT 'BLXB796201','SDR4'
UNION ALL SELECT 'BLXB799101','GTS'
UNION ALL SELECT 'BLXB799101','LC28'
UNION ALL SELECT 'BLXB799101','SDR4'
UNION ALL SELECT 'BLXB803001','GTS'
UNION ALL SELECT 'BLXB803001','LC28'
UNION ALL SELECT 'BLXB803001','SDR4'
UNION ALL SELECT 'BLXB806201','GTS'
UNION ALL SELECT 'BLXB806201','LC28'
UNION ALL SELECT 'BLXB806201','SDR4'
)
,
tb3(Fail_Code,Descr) AS (
          SELECT 'CSL120','Bad Part'
UNION ALL SELECT 'CLL250','Unit Scrapped'
UNION ALL SELECT 'CNE100','OS Reinstall'
UNION ALL SELECT 'CBN101','NTF'
)
-- real WITH clause starts here - and table "i" can contain more than 3 rows..
,
i(i) AS (
          SELECT  1
UNION ALL SELECT  2
UNION ALL SELECT  3
)
,
tb2_w_i AS (
SELECT
  *
, ROW_NUMBER() OVER (PARTITION BY cso_item_key ORDER BY event_code) AS i
FROM tb2
)
,
tb2_pivot AS (
SELECT
  cso_item_key
,      MAX(CASE i.i WHEN 1 THEN event_code END)
||','||MAX(CASE i.i WHEN 2 THEN event_code END)
||','||MAX(CASE i.i WHEN 3 THEN event_code END)
  AS event_codes
FROM tb2_w_i JOIN i USING(i)
GROUP BY 1
)
SELECT
  entry_date
, tb1.cso_item_key
, tb1.fail_code
, descr
, event_codes
FROM tb1
JOIN tb2_pivot USING(cso_item_key)
LEFT JOIN tb3 USING(fail_code)
;

结果(我的NULLSTRING是破折号..)

entry_date         |cso_item_key|fail_code|descr        |event_codes
2018-08-01 04:28:00|BLXB796201  |CSL120   |Bad Part     |GTS,LC28,SDR4
2018-08-01 04:40:00|BLXB799101  |CLL250   |Unit Scrapped|GTS,LC28,SDR4
2018-08-01 04:55:00|BLXB803001  |CMS130   |-            |GTS,LC28,SDR4
2018-08-01 05:08:00|BLXB806201  |CNE100   |OS Reinstall |GTS,LC28,SDR4