有条件地重新标记行值

时间:2019-09-16 13:29:04

标签: sql amazon-redshift

我想编写一个脚本来查看data_id和data_raw_digits上的值。如果3列的值相同,则从user_name列中获取第一个非null值,并使用相同的值重新标记与特定data_id相关联的所有列。

这是我目前拥有的

data_id   data_raw_digits    data_user_name    data_ended at    event_sequence
  1            0000               abc             112                 1
  1            0000                                                   2
  1            0000                                                   3
  1            0000                                                   4
  2            1111                                                   1
  2            1111               ccc             212                 2
  3            2222                                                   1
  3            2222               ddd                                 2 
  3            2222                               303                 3

所需的输出:

data_id   data_raw_digits    data_user_name    data_ended at    event_sequence
  1            0000               abc             112                 1
  1            0000               abc             112                 2
  1            0000               abc             112                 3
  1            0000               abc             112                 4
  2            1111               ccc             212                 1
  2            1111               ccc             212                 2
  3            2222               ddd             303                 1
  3            2222               ddd             303                 2 
  3            2222               ddd             303                 3

3 个答案:

答案 0 :(得分:0)

我将进行如下操作:

  • 对于要处理的每一列(data_user_namedata_ended_at),在子查询中使用窗口函数对记录进行排名,其中在共享相同{{ 1}}和data_raw_digits
  • data_id将这些结果与原始表一起使用,并使用LEFT JOIN将空值替换为相应组中第一条记录的值

查询:

COALESCE

Demo on DB Fiddle

SELECT
    t.data_id,
    t.data_raw_digits,
    COALESCE(t.data_user_name, t_user_name.data_user_name) data_user_name,
    COALESCE(t.data_ended_at, t_ended_at.data_ended_at) data_ended_at,
    t.event_sequence
FROM mytable t
LEFT JOIN (
    SELECT
        t.*,
        ROW_NUMBER() OVER(PARTITION BY data_id, data_raw_digits ORDER BY event_sequence) rn
    FROM mytable t
    WHERE data_user_name IS NOT NULL
) t_user_name
    ON t_user_name.rn = 1
    AND t_user_name.data_id = t.data_id 
    AND t_user_name.data_raw_digits = t.data_raw_digits
LEFT JOIN (
    SELECT
        t.*,
        ROW_NUMBER() OVER(PARTITION BY data_id, data_raw_digits ORDER BY event_sequence) rn
    FROM mytable t
    WHERE data_ended_at IS NOT NULL
) t_ended_at
    ON t_ended_at.rn = 1
    AND t_ended_at.data_id = t.data_id 
    AND t_ended_at.data_raw_digits = t.data_raw_digits;

注意:这是在MySQL小提琴中测试的,因为据我所知,互联网上没有可用的公共athena小提琴;但是,这是相当标准的SQL语法,适用于包括您自己在内的大多数RDBMS。

答案 1 :(得分:0)

我认为您可以使用窗口功能来做到这一点:

select data_id, data_raw_digits,
       max(data_user_name) over (partition by data_id, data_raw_digits) as data_user_name,
       max(data_ended) over (partition by data_id, data_raw_digits) as data_ended,
       row_number() over (partition by data_id, data_raw_digits order by data_id) as event_sequence
from t;

注意:在结果集中,只有event_sequence区分行。关键是原始行的顺序不会保留-但无法分辨。

SQL表表示无序集。除非列中明确包含该信息,否则不进行排序。而且您似乎没有这样的专栏。

答案 2 :(得分:0)

一个好的旧式自我加入聚合怎么样?

SELECT 
a.data_id as data_id,
a.data_raw_digits as data_raw_digits,
a.event_sequence as event_sequence,
max(b.data_user_name) as data_user_name,
max(b.data_ended_at) as data_ended_at        
from your_table a left join your_table b on a.data_raw_digits = b.data_raw_digits
group by a.data_id, a.data_raw_digits, a.event_sequence;