我在Cascading中使用自定义函数进行DQ检查,我在这里设置一个指标,我将根据这些指标最后过滤掉所需的管道
我为它写了两个函数。在下面的代码中,字段'A'是需要进行空检查的字符串,字段'B'是需要进行十进制检查的代码。指标'Ind'是根据质量检查结果设置的,它被传入并设置在IndicatorNull / IndicatorDecimal函数内。
但我在这段代码中遇到错误。我无法将字段'A'/'Ind'和字段'B'/'Ind'传递给同一管道的第一个和第二个过滤器。
我在这里遗漏了什么吗?请告诉我这是如何处理的。谢谢!
以下是代码的一部分 -
Scheme inscheme = new TextDelimited(new Fields("A","B","Ind"),",");
Tap sourceTap = new Hfs(inscheme, infile);
Tap sinkTap = new Hfs(inscheme, outfile);
Pipe BooleanPipe = new Pipe ("BooleanPipe");
Fields findreturnNull = new Fields( "A","Ind" );
Fields findreturnDecimal = new Fields("B", "Ind" );
BooleanPipe = new Each( BooleanPipe, findreturnNull, new
IndicatorNull(findreturnNull), Fields.RESULTS );
BooleanPipe = new Each( BooleanPipe, findreturnDecimal, new IndicatorDecimal(findreturnDecimal), Fields.RESULTS );
以下是我得到的错误 -
Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [[BooleanPipe][first.Boolean.main(Boolean.java:48)] unable to resolve argument selector: [{2}:'B', 'Ind'], with incoming: [{2}:'A', 'Ind']]
at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:577)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:286)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at cascading.flow.FlowConnector.connect(FlowConnector.java:450)
at cascading.flow.FlowConnector.connect(FlowConnector.java:426)
at cascading.flow.FlowConnector.connect(FlowConnector.java:275)
at cascading.flow.FlowConnector.connect(FlowConnector.java:220)
at cascading.flow.FlowConnector.connect(FlowConnector.java:202)
at first.Boolean.main(Boolean.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
答案 0 :(得分:1)
问题在于Fields.RESULTS
参数。
如果你看一下流程:
+----------+-----------------+-----------------+------------------------------------------+
| Command | Incoming Fields | Outgoing Fields | Reason |
+----------+-----------------+-----------------+------------------------------------------+
| Input | "A", "B", "Ind" | "A", "B", "Ind" | Input, TextDelimited |
+----------+-----------------+-----------------+------------------------------------------+
| 1st Each | "A", "B", "Ind" | "A", "Ind" | Fields.RESULTS will push only Results |
| | | | fields. Rest will be discarded. |
+----------+-----------------+-----------------+------------------------------------------+
| 2nd Each | "A", "Ind" | ERROR | IndicatorDecimal() is looking from Field |
| | | | "B" and it does not exists in Pipe. |
+----------+-----------------+-----------------+------------------------------------------+
因为你有输入和输入输出字段相同,解决方案将为Fields.REPLACE
。
参考:Fields Sets