在猪中动态生成IN条件

时间:2014-08-27 12:47:58

标签: java apache-pig

我正在使用pig 0.12版本,我想使用引用生成动态 IN 条件。

在我的猪文件中我有'm_master'的关系,当我说DESCRIBE m_master它给了我以下

m_master: {m_id: chararray,m_name: chararray,in_dx: chararray,rolled_up_name: chararray,match_code: chararray,match0: chararray,flag_ind: chararray}

现在我想执行一些操作,如

UPDATE M_Master SET flag_ind='SE' WHERE Rolled_Up_Name IN (SELECT DISTINCT Rolled_Up_Name FROM M_Master WHERE flag_ind='SE') AND flag_ind='Non SE'

等于RDBMS查询。

我已经从m_master生成了不同的roll_up_names,它被称为distinct_rollup_names

m_master = FOREACH m_master GENERATE m_id, m_name, in_dx, rolled_up_name, match_code, match0, 
    (
        (
            flag_ind='Non SE' AND rolled_up_name IN (<b>distinct_rollup_names</b>)
        ) ? 'SE' : flag_ind
    ) as flag_ind;

如何在IN条件下使用生成的关系值,请提出任何建议

1 个答案:

答案 0 :(得分:0)

Pig并不像您期望的那样支持IN子句。自己在rolling_up_name列上连接m_master,然后如果非SE和右侧flag_ind是SE,则将左侧flag_ind更新为SE

--Original m_master
m_master: {m_id: chararray,m_name: chararray,in_dx: chararray,rolled_up_name: chararray,match_code: chararray,match0: chararray,flag_ind: chararray}

-- Clone m_master into m_master2
m_master2 = FOREACH m_master GENERATE m_id, m_name, in_dx, rolled_up_name, match_code, match0, flag_ind;

-- We are interested only in SE flag_ind (this works as inner query in your question)
m_master2 = filter m_master2 by flag_ind == 'SE';

-- Now join m_master and m_master2
m_master_self_joined = JOIN m_master BY rolled_up_name LEFT OUTER, m_master2 BY rolled_up_name;

-- Now pick fields from m_master
-- When there is a match with m_master2, set flag_ind to SE
m_master_self_joined2 = FOREACH m_master_self_joined 
                        GENERATE 
                            m_master::m_id,
                            m_master::m_name,
                            m_master::in_dx,
                            m_master::rolled_up_name,
                            m_master::match_code,
                            m_master::match0,
                            (m_master::m_id == null ? 'Non SE' : 'SE');

-- Its possible to have duplicates (if rolled_up_name is not unique), so take uniques
m_master_self_joined3 = DISTINCT m_master_self_joined2;

希望这有帮助