输入:
| cust_no | month_nr | resource| segment |
|---------|----------|---------|---------|
| 1 | jan-18 | r3 | s1 |
| 1 | feb-18 | r4 | s1 |
| 1 | mar-18 | r2 | s1 |
| 1 | apr-18 | r3 | s1 |
| 1 | jun-18 | r7 | s1 |
| 2 | may-18 | r4 | s2 |
| 2 | jun-18 | r2 | s2 |
| 2 | aug-18 | r3 | s3 |
| 2 | sep-18 | r2 | s4 |
| 2 | oct-18 | r4 | s4 |
| 2 | nov-18 | r1 | s4 |
| 3 | sep-18 | r7 | s2 |
| 3 | oct-18 | r9 | s1 |
| 3 | nov-18 | r2 | s3 |
expect output:
| cust_no | month_nr | resource| segment |
|---------|----------|---------|---------|
| 1 | jan-18 | r3 | s1 |
| 2 | may-18 | r4 | s2 |
| 2 | jun-18 | r2 | s2 |
| 2 | aug-18 | r3 | s3 |
| 2 | sep-18 | r2 | s4 |
| 3 | sep-18 | r7 | s2 |
| 3 | oct-18 | r9 | s1 |
| 3 | nov-18 | r2 | s3 |
我想过滤掉发生特定列值(段)的客户记录,该记录连续2次以上保持不变,并在输出中保留第一个出现的行。 根据上面的示例数据:
有什么建议吗?
答案 0 :(得分:0)
我敢肯定Teradata支持窗口功能。此查询应生成一个行集,该行集仅是您要删除的行,即对于每个客户而言,仅是前两个行与当前行具有相同段的行。您可以将其更改为“删除主键列不在其中的位置”的删除查询:
SELECT cust_no, month_nr, resource, segment
FROM
(
SELECT
*,
LAG(segment, 1) OVER(PARTITION BY cust_no ORDER BY cast(month_nr as DATE FORMAT 'mmm-yy') as prevsegment,
LAG(segment, 2) OVER(PARTITION BY cust_no ORDER BY cast(month_nr as DATE FORMAT 'mmm-yy') as prevprevsegment,
LEAD(segment, 1) OVER(PARTITION BY cust_no ORDER BY cast(month_nr as DATE FORMAT 'mmm-yy') as nextsegment
FROM table
) a
WHERE
(segment = prevsegment AND segment = prevprevsegment) OR
(segment = prevsegment AND segment = nextsegment)
查询查看当前段,并将其与上一个和上一个段进行比较(我将其称为“前两个”规则)。它还将当前值与上一条和下一条进行比较(我将其称为“双方”规则)
对于像1,2,2,3,3,3,3,4,4,4,4,4
的段序列这是逻辑的工作原理:
1 - don't touch
2 - don't touch
2 - don't touch
3 - don't touch
3 - remove because of Either Side rule
3 - remove because of Previous Two rule
4 - don't touch
4 - either side
4 - either side
4 - previous two
依此类推
上面的查询因此返回您需要删除行作为练习题的行,因为我不知道您的主键是什么
我唯一不确定的是,如果您尝试将无日期的日期字符串转换为日期,teradata会做什么。您可能必须将“ 01-”连接到您的month_nr并调整日期格式。不要将日期存储为字符串!即使是您只想要月份的地方,也应该将其存储为每天01日的DATE。