这是我数据的架构:
movies (movieId:int, title:chararray, genre: chararray /* a pipe separated list of strings*/)
tags (userId:int, movieId:int, tag:chararray, timestamp:long)
查询为:每部动作片的标签总数
这是我回答查询的猪脚本
tags = LOAD '<path_to_dataset>/ml-20m/tags.csv'
USING PigStorage(',') AS (userId:int, movieId:int, tag:chararray, timestamp:long);
movies = LOAD '<path_to_dataset>/ml-20m/movies.csv'
USING PigStorage(',') AS (movieId:int, title:chararray, genre: chararray);
actionMovies = FILTER movies BY (genre MATCHES '.*Action.*');
actionMoviesTags = JOIN actionMovies BY movieId, tags BY movieId;
actionMovieTagsGrouped = GROUP actionMoviesTags BY actionMovies::movieId;
actionMovieNumberTags = FOREACH actionMovieTagsGrouped GENERATE group AS movieId, COUNT(actionMoviesTags.tags::tag) AS NumberTags;
DUMP actionMovieNumberTags;
这是Pig脚本
结果的一部分 ....
(1196,594)
(1198,472)
(1200,471)
(1208,379)
(1209,201)
(1210,454)
(1215,249)
(1224,36)
(1261,180)
(1264,19)
(1274,273)
(1275,106)
(1287,113)
(1291,331)
(1304,120)
(1320,134)
(1356,133)
....
但是当我尝试执行ILLUSTRATE命令时,我遇到了一个错误: 错误1071:无法将元组转换为整数
这是错误的日志文件
Pig Stack Trace
---------------
ERROR 1071: Cannot convert a tuple to an Integer
java.io.IOException: ExecException
at org.apache.pig.PigServer.getExamples(PigServer.java:1393)
at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:840)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:825)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:392)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:564)
at org.apache.pig.Main.main(Main.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: actionMoviesTags: Local Rearrange[tuple]{int}(false) - scope-112 Operator Key: scope-112): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: actionMovies: Filter[bag] - scope-90 Operator Key: scope-90): org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a tuple to an Integer
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNextTuple(POUnion.java:167)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:280)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:221)
at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257)
at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:238)
at org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:104)
at org.apache.pig.pen.LineageTrimmingVisitor.<init>(LineageTrimmingVisitor.java:99)
at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:184)
at org.apache.pig.PigServer.getExamples(PigServer.java:1390)
... 14 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: actionMovies: Filter[bag] - scope-90 Operator Key: scope-90): org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a tuple to an Integer
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:90)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
... 27 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a tuple to an Integer
at org.apache.pig.data.DataType.toInteger(DataType.java:781)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNextInteger(POCast.java:503)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:347)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:406)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:323)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
... 29 more
================================================================================
我认为我的脚本会回答查询( DUMP执行!)。我试图在整个网络上修复搜索,但我不能。
任何帮助都将非常感激。
PS:tags.csv
和movies.csv
中没有标头,只有行。
我正在使用Pig Shell,Grunt。