Hive - 复制先前行的值

时间:2016-04-06 18:21:34

标签: null hive copy records multiple-records

如果当前字段中的值为NULL,我正在尝试编写一个Hive查询,该查询将复制同一列中前一行的字段值。如果当前值不为null,则应保留该值。例如,如果我有以下输入:

    company    empId   first_name   last_ame    job_code   department   start_date
    110        500400   ABC          XYZ         300        101         01/20/2015
    110        500400   Null         Null        305        105         04/02/2015
    110        500400   ABC1         Null        Null       Null        15/02/1015
    110        500400   Null         XYZ1        307        Null        01/03/2015

输出应该是这样的:

    company    empId   first_name   last_name   job_code   department   start_date
    110        500400   ABC          XYZ         300        101         01/20/2015
    110        500400   ABC          XYZ         305        105         04/02/2015
    110        500400   ABC1         XYZ         305        105         15/02/1015
    110        500400   ABC1         XYZ1        307        105         01/03/2015

我尝试使用last_value和lag函数进行查询,但两者似乎都不起作用。使用last_value时,它仅在行数有限时才起作用。当我在大型数据集上运行它时,它会失败(map-red没有完成)。这是我正在尝试的查询:

    select
    company, empId, start_date,
    last_value(last_name, true) over (partition by company, empId order by    start_date) as last_name,
    last_value(first_name, true) over (partition by company, empId order by start_date) as first_name,
    last_value(department, true) over (partition by company, empId order by start_date) as department,
    last_value(job_code, true) over(partition by company,empId order by start_date) as job_code from samples.z_sample_test order by start_date;

有了延迟,只有一条记录正在更新。所有后续记录均未更新。这是我正在使用的查询:

    select
    c.company,
    c.empId,
    c.start_date,
    if(c.first_name is null, lag(c.first_name, 1) over (order by c.start_date), c.first_name) as first_name,
    if(c.last_name is null, lag(c.last_name, 1) over (order by    c.start_date), c.last_name) as last_name,
    if(c.job_code is null, lag(c.job_code, 1) over (order by c.start_date), c.job_code) as job_code,
    if(c.department is null, lag(c.department, 1) over (order by c.start_date), c.department) as department
    from samples.z_sample_test c
    left join samples.z_sample_test p
    on (c.company = p.company and c.empId = p.empId)
    group by c.company, c.employee, c.start_date, c.last_name, c.first_name,  c.job_code, c.department order by c.start_date;

我很感激这方面的任何帮助。

0 个答案:

没有答案