Pig:即使DUMP命令工作,ILLUSTRATE命令也会出错

时间:2017-10-15 21:24:45

标签: mapreduce apache-pig

这是我数据的架构:

movies (movieId:int, title:chararray, genre: chararray /* a pipe separated list of strings*/) 

tags (userId:int, movieId:int, tag:chararray, timestamp:long)

查询为:每部动作片的标签总数

这是我回答查询的猪脚本

tags = LOAD '<path_to_dataset>/ml-20m/tags.csv'
       USING PigStorage(',') AS (userId:int, movieId:int, tag:chararray, timestamp:long);

movies = LOAD '<path_to_dataset>/ml-20m/movies.csv' 
         USING PigStorage(',') AS (movieId:int, title:chararray, genre: chararray);

actionMovies = FILTER movies BY (genre MATCHES '.*Action.*');
actionMoviesTags = JOIN actionMovies BY movieId, tags BY movieId;
actionMovieTagsGrouped = GROUP actionMoviesTags BY actionMovies::movieId;
actionMovieNumberTags = FOREACH actionMovieTagsGrouped GENERATE group AS movieId, COUNT(actionMoviesTags.tags::tag) AS NumberTags;
DUMP actionMovieNumberTags;

这是Pig脚本

结果的一部分
   ....
(1196,594)
(1198,472)
(1200,471)
(1208,379)
(1209,201)
(1210,454)
(1215,249)
(1224,36)
(1261,180)
(1264,19)
(1274,273)
(1275,106)
(1287,113)
(1291,331)
(1304,120)
(1320,134)
(1356,133)
  ....

但是当我尝试执行ILLUSTRATE命令时,我遇到了一个错误:  错误1071:无法将元组转换为整数

这是错误的日志文件

Pig Stack Trace
---------------
ERROR 1071: Cannot convert a tuple to an Integer

java.io.IOException: ExecException
    at org.apache.pig.PigServer.getExamples(PigServer.java:1393)
    at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:840)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:825)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:392)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
    at org.apache.pig.Main.run(Main.java:564)
    at org.apache.pig.Main.main(Main.java:175)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: actionMoviesTags: Local Rearrange[tuple]{int}(false) - scope-112 Operator Key: scope-112): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: actionMovies: Filter[bag] - scope-90 Operator Key: scope-90): org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a tuple to an Integer
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNextTuple(POUnion.java:167)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:280)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
    at org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:221)
    at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257)
    at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:238)
    at org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:104)
    at org.apache.pig.pen.LineageTrimmingVisitor.<init>(LineageTrimmingVisitor.java:99)
    at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:184)
    at org.apache.pig.PigServer.getExamples(PigServer.java:1390)
    ... 14 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: actionMovies: Filter[bag] - scope-90 Operator Key: scope-90): org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a tuple to an Integer
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:90)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
    ... 27 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a tuple to an Integer
    at org.apache.pig.data.DataType.toInteger(DataType.java:781)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNextInteger(POCast.java:503)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:347)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:406)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:323)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
    ... 29 more
================================================================================

我认为我的脚本会回答查询( DUMP执行!)。我试图在整个网络上修复搜索,但我不能。

任何帮助都将非常感激。

PS:tags.csvmovies.csv中没有标头,只有行。

我正在使用Pig Shell,Grunt。

0 个答案:

没有答案