我需要一些解决此问题的指导。我有一个类似于以下数据集的
Record_type Record_Text
H01 ABCDEFGHI123456789
D45 BCDEFGH098765
D20 BABRTHYUHU56789
D30 QWERTY09876558471255
D12 ASDFGHJ9814752
H02 UGHRYCGDF12304025
G80 YHNBGTRFV0147852
H01 MLOPKIJUHNB624817
D20 PLKIJUNHMY7653235
H15 MVNBDGETDGSTEX9874
D30 GNHGDTBFJVNV834687
H02 JDGHKDGHSDFIG7845387
D60 GHCNDBDGCTEF45367
每次出现H01都会启动一项新交易。根据上面的例子,有两个交易(H01到G80和H01到D60)。
我需要根据每个事务的某些条件从RECORD_TEXT字段中选择某些字符。 我使用下面的代码
单独尝试了第一笔交易SELECT
( MAX(CASE WHEN RECORD_TYPE = 'H01' THEN (SUBSTR(RECORD_TEXT,1,10)) END)
|| MAX(CASE WHEN RECORD_TYPE = 'D20' THEN ',' || (SUBSTR(RECORD_TEXT,2,3)) END)
|| MAX(CASE WHEN RECORD_TYPE = 'D30' THEN ',' || (SUBSTR(RECORD_TEXT,9,8)) END)
|| MAX(CASE WHEN RECORD_TYPE = 'H02' THEN ',' || (SUBSTR(RECORD_TEXT,13,4)) END)) AS TOTAL_FIELD
FROM TABLE
我得到了预期的输出。
ABCDEFGHI1,ABR,87655847,0402
但我无法继续进行后续交易。
上述示例的预期输出(两个事务=两行)将是
ABCDEFGHI1,ABR,87655847,0402
MLOPKIJUHN,LKI,JVNV8346,G784
总共有大约200笔交易。我正在使用Teradata版本14.请帮助。
答案 0 :(得分:0)
这是分析功能的工作,我不熟悉teradata,但应该与其他地方提供的相似
查找分区,查看此链接
http://www.tutorialspoint.com/teradata/teradata_partitioned_primary_index.htm
您基本上可以按照自己的方式对数据进行切片 所以你会做这样的事情
PARTITION BY Record_type
除非按照Tyron78的建议添加列,否则您还必须使用其他分析函数来创建算法以确定属于此集的记录之间的内容。
希望这有帮助
答案 1 :(得分:0)
DECLARE @t table(
CREATE_TMSP int, Record_type nvarchar(20), Record_Text nvarchar(50)
);
INSERT INTO @t VALUES(1,'H01','ABCDEFGHI123456789');
INSERT INTO @t VALUES(2,'D45','BCDEFGH098765');
INSERT INTO @t VALUES(3,'D20','BABRTHYUHU56789');
INSERT INTO @t VALUES(4,'D30','QWERTY09876558471255');
INSERT INTO @t VALUES(5,'D12','ASDFGHJ9814752');
INSERT INTO @t VALUES(6,'H02','UGHRYCGDF12304025');
INSERT INTO @t VALUES(7,'G80','YHNBGTRFV0147852');
INSERT INTO @t VALUES(8,'H01','MLOPKIJUHNB624817');
INSERT INTO @t VALUES(9,'D20','PLKIJUNHMY7653235');
INSERT INTO @t VALUES(10,'H15','MVNBDGETDGSTEX9874');
INSERT INTO @t VALUES(11,'D30','GNHGDTBFJVNV834687');
INSERT INTO @t VALUES(12,'H02','JDGHKDGHSDFIG7845387');
INSERT INTO @t VALUES(13,'D60','GHCNDBDGCTEF45367');
WITH cte AS(
SELECT RECORD_TYPE, RECORD_TEXT, DENSE_RANK() OVER(ORDER BY CREATE_TMSP) AS DERIVED_COLUMN
FROM @t
),
cteLead AS(
SELECT Record_Type, Record_Text, DERIVED_COLUMN AS DERIVED_COLUMN_LEFT, ISNULL(LEAD(DERIVED_COLUMN) OVER (ORDER BY DERIVED_COLUMN), 999999) AS DERIVED_COLUMN_RIGHT
FROM cte
WHERE Record_type = 'H01'
),
cteSplit AS(
SELECT a.DERIVED_COLUMN_LEFT AS ID, a.Record_Type AS RecordTypeHead, a.Record_Text AS RecordTextHead, a.DERIVED_COLUMN_LEFT, a.DERIVED_COLUMN_RIGHT,
b.Record_Type,
CASE
WHEN b.Record_type = 'H01' THEN SUBSTRING(b.RECORD_TEXT,1,10)
WHEN b.Record_type = 'D20' THEN SUBSTRING(b.RECORD_TEXT,2,3)
WHEN b.Record_type = 'D30' THEN SUBSTRING(b.RECORD_TEXT,9,8)
WHEN b.Record_type = 'H02' THEN SUBSTRING(b.RECORD_TEXT,13,4)
END AS RecordTextSplit
FROM cteLead AS a
JOIN cte AS b ON b.DERIVED_COLUMN >= a.DERIVED_COLUMN_LEFT AND b.DERIVED_COLUMN < a.DERIVED_COLUMN_RIGHT
WHERE b.Record_type IN ('H01', 'D20', 'D30', 'H02')
)
--
SELECT * FROM cteSplit
PIVOT
(
MAX(RecordTextSplit)
FOR Record_Type IN (H01, D20, D30, H02)
) AS pvt
答案 2 :(得分:0)
添加时间戳列后,可以轻松为每个事务分配唯一编号。然后您可以应用现有的计算:
SELECT
trans#,
MAX(CASE WHEN RECORD_TYPE = 'H01' THEN (SUBSTR(RECORD_TEXT, 1,10)) END)
|| MAX(CASE WHEN RECORD_TYPE = 'D20' THEN ',' || (SUBSTR(RECORD_TEXT, 2, 3)) END)
|| MAX(CASE WHEN RECORD_TYPE = 'D30' THEN ',' || (SUBSTR(RECORD_TEXT, 9, 8)) END)
|| MAX(CASE WHEN RECORD_TYPE = 'H02' THEN ',' || (SUBSTR(RECORD_TEXT,13, 4)) END)) AS TOTAL_FIELD
FROM
(
SELECT CREATE_TMSP,RECORD_TYPE, RECORD_TEXT,
-- assign a unique number to each transaction
SUM(CASE WHEN Record_type = 'H01' THEN 1 ELSE 0 END)
OVER (ORDER BY CREATE_TMSP
ROWS UNBOUNDED PRECEDING) AS trans#
FROM table
-- more efficient to filter unneeded data before the OLAP function
WHERE RECORD_TYPE IN ('H01','D20','D30','H02')
-- uncomment if the data doesn't start with an 'H01' row and you don't want partial transactions
-- QUALIFY trans# > 0
) AS dt
GROUP BY trans#