级联中的横向视图功能

时间:2014-09-02 11:00:01

标签: hadoop hive cascading

我的桌子就像是:

TableName: myTab

+----+---------------------+
| ID |        Codes        |
+----+---------------------+
| 1  | ABC,DEF,GHI,JLK,MNO |
+----+---------------------+

我正在开发级联应用程序,它应该将上表转换为以下内容:

+----+---------------------+------+
| ID |        Codes        | code |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | ABC  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | DEF  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | GHI  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | JLK  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | MNO  |
+----+---------------------+------+

如果我使用Hive,则可以使用LATERAL VIEW轻松完成。

SELECT 
    ID, Codes, Code
FROM 
    myTab LATERAL VIEW explode(Codes) codesTab AS code

但我想在Cascading做同样的事情。有办法吗?

1 个答案:

答案 0 :(得分:0)

可以使用函数完成(可能还有其他方法)。只需要为OutputCollector为每个标记添加新的元组。

像:

import static com.google.common.base.Preconditions.checkArgument;
import cascading.flow.FlowProcess;
import cascading.operation.BaseOperation;
import cascading.operation.Function;
import cascading.operation.FunctionCall;
import cascading.tuple.Fields;
import cascading.tuple.Tuple;

public class TestLateralView extends BaseOperation<Void> implements Function<Void> {
    private static final long serialVersionUID = 1L;

    public TestLateralView(Fields fields) {
        super(fields);
        checkArgument(fields.size() == 1);
    }

    @Override
    public void operate(@SuppressWarnings("rawtypes") FlowProcess flowProcess, FunctionCall<Void> functionCall) {
        Tuple tuple = functionCall.getArguments().getTuple();
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < tuple.size(); i++) {
            sb.append(tuple.getString(i));
            sb.append(",");
        }

        String[] tokens = sb.toString().split(",");

        for (String token : tokens) {
            functionCall.getOutputCollector().add(new Tuple (token));
        }
    }
} 

使用上面的函数,我得到预期的输出。

在Assembly中,上面的函数可以被称为:

pipe = new Each(pipe, CODES, new TestLateralView(CODE), Fields.ALL);