如何在Teradata 14中对子串进行分组?

时间:2014-04-11 20:23:54

标签: teradata strtok

我在 Teradata 14 中有以下表格,我不能自己编写程序和功能,但我可以使用strtokstrtok_split_to_table

id  property
1   1234X (Yel), 2225Y (Red), 1234X (Gre),
2
3   1222Y (Pin), 
4   1134E (Yel), 4565Y (Whi), 1134E (Red), 2222Y (Red), 

如何对上表进行分组,以便每个对象都具有一个括号中列出的所有属性

id  property
1   1234X (Yel Gre), 2225Y (Red), 
2   
3   1222Y (Pin ),
4   1134E (Yel Red), 4565Y (Whi), 2222Y (Red), 

属性代码总是一个5个字符的字符串,例如1222Y。颜色代码总是3个字符,例如Pin


我尝试使用this solution,但收到错误A column or character expression is larger than max size

此外,我尝试了strtok_split_to_table并且能够创建一个修改过的表,但是不知道如何从中继续

2 个答案:

答案 0 :(得分:2)

试试这个,我从你的帖子中略微修改了dnoeths查询

 WITH RECURSIVE cte
 (id,
  len,
 remaining,
word,
pos
) AS (
SELECT
id,
POSITION(',' IN property || ',') - 1 AS len,
SUBSTRING(property || ',' FROM len + 2) AS remaining,
TRIM(SUBSTRING(property FROM 1 FOR len)) AS word,
1
FROM TableA
UNION ALL
SELECT
id,
POSITION(',' IN remaining)- 1 AS len_new,
SUBSTRING(remaining FROM len_new + 2),
TRIM(SUBSTRING(remaining FROM 1 FOR len_new)),
pos + 1
FROM cte
 WHERE remaining <> ''
)
SELECT
id,
 MAX(CASE WHEN newpos = 1 THEN newgrp ELSE '' END) ||
 MAX(CASE WHEN newpos = 2 THEN newgrp ELSE '' END) ||
 MAX(CASE WHEN newpos = 3 THEN newgrp ELSE '' END) ||
 MAX(CASE WHEN newpos = 4 THEN newgrp ELSE '' END) ||
 MAX(CASE WHEN newpos = 5 THEN newgrp ELSE '' END) ||
 MAX(CASE WHEN newpos = 6 THEN newgrp ELSE '' END)
 -- add as many CASEs as needed
FROM
( 
  SELECT 
 id, 
 ROW_NUMBER() 
 OVER (PARTITION BY id
       ORDER BY newgrp) AS newpos,
 a ||
 MAX(CASE WHEN pos = 1 THEN '('  || b ELSE '' END) ||
 MAX(CASE WHEN pos = 2 THEN ' ' || b ELSE '' END) ||
 MAX(CASE WHEN pos = 3 THEN ' ' || b ELSE '' END) ||
 MAX(CASE WHEN pos = 4 THEN ' ' || b ELSE '' END) ||
 MAX(CASE WHEN pos = 5 THEN ' ' || b ELSE '' END) ||
 MAX(CASE WHEN pos = 6 THEN ' ' || b ELSE '' END)
 -- add as many CASEs as needed
 || '), ' AS newgrp
FROM 
(
  SELECT
    id,
    ROW_NUMBER() 
    OVER (PARTITION BY id, a
          ORDER BY pos) AS pos,
    SUBSTRING(word FROM 1 FOR POSITION('(' IN word) - 1) AS a,
    TRIM(TRAILING ')' FROM SUBSTRING(word FROM POSITION('(' IN word) + 1)) AS b
  FROM cte
 WHERE word <> ''
) AS dt
 GROUP BY id, a
 ) AS dt
   GROUP BY id
  UNION ALL
  SELECT id,property FROM TableA WHERE property IS NULL OR TRIM(property)=' ';

答案 1 :(得分:2)

为什么要将非规范化数据存储在RDBMS中,然后对其进行处理以创建更差的非规范化输出?

从您发布的链接修改我的解决方案以使用STRTOK_SPLIT_TO_TABLE而不是递归:

SELECT
   id,
   MAX(CASE WHEN newpos = 1 AND newgrp <> '(),' THEN newgrp ELSE '' END) ||
   MAX(CASE WHEN newpos = 2 THEN newgrp ELSE '' END) ||
   MAX(CASE WHEN newpos = 3 THEN newgrp ELSE '' END) ||
   MAX(CASE WHEN newpos = 4 THEN newgrp ELSE '' END) ||
   MAX(CASE WHEN newpos = 5 THEN newgrp ELSE '' END) ||
   MAX(CASE WHEN newpos = 6 THEN newgrp ELSE '' END)
   -- add as many CASEs as needed
FROM
 ( 
   SELECT 
     id, 
     ROW_NUMBER() 
     OVER (PARTITION BY id
           ORDER BY newgrp) AS newpos,
     TRIM(a || ' (' ||
     MAX(CASE WHEN tokennum = 1 THEN b || ' ' ELSE '' END) ||
     MAX(CASE WHEN tokennum = 2 THEN b || ' ' ELSE '' END) ||
     MAX(CASE WHEN tokennum = 3 THEN b || ' ' ELSE '' END) ||
     MAX(CASE WHEN tokennum = 4 THEN b || ' ' ELSE '' END) ||
     MAX(CASE WHEN tokennum = 5 THEN b || ' ' ELSE '' END) ||
     MAX(CASE WHEN tokennum = 6 THEN b || ' ' ELSE '' END)
     -- add as many CASEs as needd
     ) || '), ' AS newgrp
   FROM 
    (
      SELECT
        id, tokennum,
        TRIM(SUBSTRING(token FROM 1 FOR POSITION('(' IN TRIM(token)||'(') - 1)) AS a,
        TRIM(TRAILING ')' FROM SUBSTRING(token FROM POSITION('(' IN token) + 1)) AS b
      FROM
        TABLE( STRTOK_SPLIT_TO_TABLE(vt.id, vt.property, ',')
        RETURNS (id INT,
                 tokennum INT, 
                 token VARCHAR(30) CHARACTER SET UNICODE
                )
             ) AS dt
    ) AS dt
   GROUP BY id, a
 ) AS dt
GROUP BY id;

如果您可以访问TDStats.udfconcat函数,可以进一步简化(但有控制属性顺序的方法:

SELECT id, 
   CASE
     WHEN TRIM(TDStats.udfconcat(' ' || a || ' ' || b)) || ',' <> '(),'
     THEN TRIM(TDStats.udfconcat(' ' || a || ' ' || b)) || ','
     ELSE ''
   END
FROM 
 (
   SELECT
     id,
     TRIM(SUBSTRING(token FROM 1 FOR POSITION('(' IN TRIM(token)||'(') - 1)) AS a,
     '('|| OTRANSLATE(TDStats.udfconcat(TRIM(TRAILING ')' FROM SUBSTRING(token FROM POSITION('(' IN token) + 1))), ',', ' ') || ')'AS b
   FROM
     TABLE( STRTOK_SPLIT_TO_TABLE(vt.id, vt.property, ',')
     RETURNS (id INT,
              tokennum INT, 
              token VARCHAR(30) CHARACTER SET UNICODE
             )
          ) AS dt
   GROUP BY id, a
 ) AS dt
GROUP BY id;

大多数工作都是在正确的位置摆弄空格和逗号以获得​​所需的输出。

仍然我永远不会在RDBMS中存储数据。