哪个类解析Hive& Ping到Map Reduce

时间:2013-06-06 10:22:11

标签: hadoop hive apache-pig

解析猪和猪的班级是什么? hive命令进入Map Reduce作业, 这个解析背后的算法是什么?

1 个答案:

答案 0 :(得分:4)

Pig和Hive都使用ANTLR来构建解析脚本的编译器。如果您不熟悉编译器理论,我建议您阅读一些相关材料。

对于Pig,ANLTR的源代码为src/org/apache/pig/parser/QueryLexer.gsrc/org/apache/pig/parser/QueryParser.g。它们将编译为org.apache.pig.parser.QueryLexerorg.apache.pig.parser.QueryParser。但是,这两个类用于将Pig脚本编译为抽象语法树。然后它将转换为org.apache.pig.newplan.logical.relational.LogicalPlan。之后,LogcialPlan将转换为org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan。这里我列出了一些相关的源文件:

org.apache.pig.newplan.logical.relational.LogicalPlan
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.MROperPlan
org.apache.pig.parser.QueryParserDriver.parse(String)
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(LogicalPlan, Properties)
org.apache.pig.PigServer.launchPlan(PhysicalPlan, String)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(PhysicalPlan, PigContext)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(MROperPlan, MapReduceOper, Configuration, PigContext)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(MROperPlan, String)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(PhysicalPlan, String, PigContext)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(List<Result>, List<Result>, Tuple)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce.Map.collect(Context, Tuple)

对于Hive,ANLTR的源代码是ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g。它将编译为org.apache.hadoop.hive.ql.parse.HiveLexerorg.apache.hadoop.hive.ql.parse.HiveParser。这两个类用于将Hive脚本编译为抽象语法树。然后它将转换为org.apache.hadoop.hive.ql.QueryPlan。 Hive中的mapper和reducer是ExecMapper和ExecReducer。

这里我列出了一些相关的源文件:

org.apache.hadoop.hive.cli.CliDriver
org.apache.hadoop.hive.ql.Driver
org.apache.hadoop.hive.ql.Driver.run(String)
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(String, Context)
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(String, Context)
org.apache.hadoop.hive.ql.parse.ASTNode
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer
org.apache.hadoop.hive.ql.QueryPlan
org.apache.hadoop.hive.ql.Driver.compile(String, boolean)
org.apache.hadoop.hive.ql.exec.TaskRunner
org.apache.hadoop.hive.ql.Driver.execute()
org.apache.hadoop.hive.ql.exec.ExecDriver
org.apache.hadoop.hive.ql.exec.ExecMapper
org.apache.hadoop.hive.ql.exec.ExecReducer
org.apache.hadoop.hive.ql.exec.MapOperator

最后,我建议你下载他们的源代码并在eclipse中浏览它们,找出你想知道的任何问题。