我的桌子就像是:
TableName: myTab
+----+---------------------+
| ID | Codes |
+----+---------------------+
| 1 | ABC,DEF,GHI,JLK,MNO |
+----+---------------------+
我正在开发级联应用程序,它应该将上表转换为以下内容:
+----+---------------------+------+
| ID | Codes | code |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | ABC |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | DEF |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | GHI |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | JLK |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | MNO |
+----+---------------------+------+
如果我使用Hive,则可以使用LATERAL VIEW轻松完成。
SELECT
ID, Codes, Code
FROM
myTab LATERAL VIEW explode(Codes) codesTab AS code
但我想在Cascading做同样的事情。有办法吗?
答案 0 :(得分:0)
可以使用函数完成(可能还有其他方法)。只需要为OutputCollector为每个标记添加新的元组。
像:
import static com.google.common.base.Preconditions.checkArgument;
import cascading.flow.FlowProcess;
import cascading.operation.BaseOperation;
import cascading.operation.Function;
import cascading.operation.FunctionCall;
import cascading.tuple.Fields;
import cascading.tuple.Tuple;
public class TestLateralView extends BaseOperation<Void> implements Function<Void> {
private static final long serialVersionUID = 1L;
public TestLateralView(Fields fields) {
super(fields);
checkArgument(fields.size() == 1);
}
@Override
public void operate(@SuppressWarnings("rawtypes") FlowProcess flowProcess, FunctionCall<Void> functionCall) {
Tuple tuple = functionCall.getArguments().getTuple();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < tuple.size(); i++) {
sb.append(tuple.getString(i));
sb.append(",");
}
String[] tokens = sb.toString().split(",");
for (String token : tokens) {
functionCall.getOutputCollector().add(new Tuple (token));
}
}
}
使用上面的函数,我得到预期的输出。
在Assembly中,上面的函数可以被称为:
pipe = new Each(pipe, CODES, new TestLateralView(CODE), Fields.ALL);