级联教程字数计数示例错误

时间:2013-11-19 18:45:13

标签: java hadoop cascading

我现在正在学习级联。现在我正在寻找其官方网站上的第二个教程,该教程是关于工作计数的例子。我从中复制代码并尝试运行,它总是给我以下错误:

Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [[token][com.starscriber.cascadingtest.Main.main(Main.java:44)] 
unable to resolve argument selector: [{1}:'text'], with incoming: [{1}:'doc01        A rain shadow is a dry area on the lee back side of a mountainous area.']] at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:576)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:263)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at com.starscriber.cascadingtest.Main.main(Main.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Caused by: cascading.pipe.OperatorException: [token][com.starscriber.cascadingtest.Main.main(Main.java:44)] 
unable to resolve argument selector: [{1}:'text'], with incoming: [{1}:'doc01        A rain shadow is a dry area on the lee back side of a mountainous area.']
at cascading.pipe.Operator.resolveArgumentSelector(Operator.java:345)
at cascading.pipe.Each.outgoingScopeFor(Each.java:368)
at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:628)
at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:610)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:248)
... 8 more

Caused by: cascading.tuple.FieldsResolverException: 
could not select fields: [{1}:'text'], from: [{1}:'doc01        A rain shadow is a dry area on the lee back side of a mountainous area.']
at cascading.tuple.Fields.indexOf(Fields.java:1008)
at cascading.tuple.Fields.select(Fields.java:1064)
at cascading.pipe.Operator.resolveArgumentSelector(Operator.java:341)
... 12 more

怎么回事?我复制完全相同的代码来自其官方Github并且不会改变任何东西......

String docPath = args[0];
String wcPath = args[1];

Properties properties = new Properties();          
AppProps.setApplicationJarClass(properties, Main.class);
HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);

// create source and sink taps
Tap docTap = new Hfs(new TextDelimited(true, "\t"), docPath);
Tap wcTap = new Hfs(new TextDelimited(true, "\t"), wcPath);

// specify a regex operation to split the "document" text lines into a token stream
Fields token = new Fields("token");
Fields text = new Fields("text");
RegexSplitGenerator splitter = new RegexSplitGenerator(token, "[ \\[\\]\\(\\),.]");
// only returns "token"
Pipe docPipe = new Each("token", text, splitter, Fields.RESULTS);

// determine the word counts
Pipe wcPipe = new Pipe("wc", docPipe);
wcPipe = new GroupBy(wcPipe, token);
wcPipe = new Every(wcPipe, Fields.ALL, new Count(), Fields.ALL);

// connect the taps, pipes, etc., into a flow
FlowDef flowDef = FlowDef.flowDef()
            .setName("wc")
            .addSource(docPipe, docTap)
            .addTailSink(wcPipe, wcTap);

// write a DOT file and run the flow
Flow wcFlow = flowConnector.connect(flowDef);
wcFlow.writeDOT("dot/wc.dot");
wcFlow.complete();

问题在哪里?

这是输入文件:

doc01        A rain shadow is a dry area on the lee back side of a mountainous area.
doc02        This sinking, dry air produces a rain shadow, or area in the lee of a mountain with less rain and cloudcover.
doc03        A rain shadow is an area of dry land that lies on the leeward (or downwind) side of a mountain.
doc04        This is known as the rain shadow effect and is the primary cause of leeward deserts of mountain ranges, such as California's Death Valley.
doc05        Two Women. Secrets. A Broken Land. [DVD Australia]

2 个答案:

答案 0 :(得分:1)

检查两个字段docId和输入文件中的文本之间是否有选项卡。程序期望两个字段与制表符分隔,但在您的情况下,它将整行读入一个字段。

答案 1 :(得分:0)

正如其他人已经提到过的那样,你需要拥有相同的标题。而不是复制代码,尝试克隆存储库,以便您不会有任何与文件格式相关的错误