我在单节点分发中运行Hive 1.2.1和Hadoop 2.6.0。我在HDFS目录/usr/<name>/test.json
中有一个简单的JSON文件。
我创建一个HIVE表并使用以下脚本加载此数据:
CREATE EXTERNAL TABLE json_table (json string) LOCATION '/user/<name>/test.json'
这成功了。现在我从命令行查询表:
hive> select get_json_object(json_table.json,'$.doc') from json_table;
OK
123
456
789
345
987
现在,问题部分:
select count(*) from json_table;
此查询在幕后启动MR作业但从未完成。我必须从命令行手动终止查询。当我从命令行启用它们时,我无法从日志或控制台推断出多少。
简而言之,任何具有SQL函数的查询,查询都会陷入Hive!我错过了SQL函数所需的任何jar,需要放入HDFS吗?
从命令行粘贴堆栈跟踪:
hive> select count(*) from json_table;
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO parse.ParseDriver: Parsing command: select count(*) from json_table
15/11/28 21:44:23 [main]: INFO parse.ParseDriver: Parse Completed
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=parse start=1448765062471 end=1448765063122 duration=651 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Starting Semantic Analysis
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Completed phase 1 of Semantic Analysis
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Get metadata for source tables
15/11/28 21:44:23 [main]: INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=json_table
15/11/28 21:44:23 [main]: INFO HiveMetaStore.audit: ugi=sriramvaradharajan ip=unknown-ip-addr cmd=get_table : db=default tbl=json_table
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Get metadata for subqueries
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Get metadata for destination tables
15/11/28 21:44:23 [main]: INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Completed getting MetaData in Semantic Analysis
15/11/28 21:44:23 [main]: INFO parse.BaseSemanticAnalyzer: Not invoking CBO because the statement has too few joins
15/11/28 21:44:23 [main]: INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10000/.hive-staging_hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Set stats collection dir : hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10000/.hive-staging_hive_2015-11-28_21-44-22_469_6217205930037435432-1/-ext-10002
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for FS(6)
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for SEL(5)
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for GBY(4)
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for RS(3)
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for GBY(2)
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for SEL(1)
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for TS(0)
15/11/28 21:44:23 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 3 oldColExprMap: {VALUE._col0=Column[_col0]}
15/11/28 21:44:23 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 3 newColExprMap: {VALUE._col0=Column[_col0]}
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=partition-retrieving start=1448765063731 end=1448765063732 duration=1 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Completed plan generation
15/11/28 21:44:23 [main]: INFO ql.Driver: Semantic Analysis Completed
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1448765063125 end=1448765063749 duration=624 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO exec.ListSinkOperator: Initializing operator OP[7]
15/11/28 21:44:23 [main]: INFO exec.ListSinkOperator: Initialization Done 7 OP
15/11/28 21:44:23 [main]: INFO exec.ListSinkOperator: Operator 7 OP initialized
15/11/28 21:44:23 [main]: INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=compile start=1448765062445 end=1448765063772 duration=1327 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO ql.Driver: Starting command(queryId=sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056): select count(*) from json_table
Query ID = sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056
15/11/28 21:44:23 [main]: INFO ql.Driver: Query ID = sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056
Total jobs = 1
15/11/28 21:44:23 [main]: INFO ql.Driver: Total jobs = 1
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1448765062445 end=1448765063775 duration=1330 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=task.MAPRED.Stage-1 from=org.apache.hadoop.hive.ql.Driver>
Launching Job 1 out of 1
15/11/28 21:44:23 [main]: INFO ql.Driver: Launching Job 1 out of 1
15/11/28 21:44:23 [main]: INFO ql.Driver: Starting task [Stage-1:MAPRED] in serial mode
Number of reduce tasks determined at compile time: 1
15/11/28 21:44:23 [main]: INFO exec.Task: Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
15/11/28 21:44:23 [main]: INFO exec.Task: In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
15/11/28 21:44:23 [main]: INFO exec.Task: set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
15/11/28 21:44:23 [main]: INFO exec.Task: In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
15/11/28 21:44:23 [main]: INFO exec.Task: set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
15/11/28 21:44:23 [main]: INFO exec.Task: In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
15/11/28 21:44:23 [main]: INFO exec.Task: set mapreduce.job.reduces=<number>
15/11/28 21:44:23 [main]: INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO mr.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
15/11/28 21:44:23 [main]: INFO exec.Utilities: Processing alias json_table
15/11/28 21:44:23 [main]: INFO exec.Utilities: Adding input file hdfs://localhost:9000/user/sriramvaradharajan
15/11/28 21:44:23 [main]: INFO exec.Utilities: Content Summary not cached for hdfs://localhost:9000/user/sriramvaradharajan
15/11/28 21:44:23 [main]: INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: INFO exec.Utilities: Serializing MapWork via kryo
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=serializePlan start=1448765063830 end=1448765063973 duration=143 from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: INFO Configuration.deprecation: mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: INFO exec.Utilities: Serializing ReduceWork via kryo
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=serializePlan start=1448765063980 end=1448765063994 duration=14 from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: ERROR mr.ExecDriver: yarn
15/11/28 21:44:24 [main]: INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/11/28 21:44:24 [main]: INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/11/28 21:44:24 [main]: INFO exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10004/be3b1969-bd83-4db2-b51c-c5e7b6596f79/map.xml
15/11/28 21:44:24 [main]: INFO exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10004/be3b1969-bd83-4db2-b51c-c5e7b6596f79/reduce.xml
15/11/28 21:44:24 [main]: WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/11/28 21:44:24 [main]: INFO log.PerfLogger: <PERFLOG method=getSplits from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
15/11/28 21:44:24 [main]: INFO exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10004/be3b1969-bd83-4db2-b51c-c5e7b6596f79/map.xml
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: Total number of paths: 1, launching 1 threads to check non-combinable ones.
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: CombineHiveInputSplit creating pool for hdfs://localhost:9000/user/sriramvaradharajan; using filter path hdfs://localhost:9000/user/sriramvaradharajan
15/11/28 21:44:24 [main]: INFO input.FileInputFormat: Total input paths to process : 2
15/11/28 21:44:24 [main]: INFO input.CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: number of splits 1
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: Number of all splits 1
15/11/28 21:44:24 [main]: INFO log.PerfLogger: </PERFLOG method=getSplits start=1448765064446 end=1448765064479 duration=33 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
15/11/28 21:44:24 [main]: INFO mapreduce.JobSubmitter: number of splits:1
15/11/28 21:44:24 [main]: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1448752941502_0008
15/11/28 21:44:24 [main]: INFO impl.YarnClientImpl: Submitted application application_1448752941502_0008
15/11/28 21:44:24 [main]: INFO mapreduce.Job: The url to track the job: h--ttp://admins-MacBook-Pro.local:8088/proxy/application_1448752941502_0008/
Starting Job = job_1448752941502_0008, Tracking URL = h--ttp://admins-MacBook-Pro.local:8088/proxy/application_1448752941502_0008/
15/11/28 21:44:24 [main]: INFO exec.Task: Starting Job = job_1448752941502_0008, Tracking URL = h--ttp://admins-MacBook-Pro.local:8088/proxy/application_1448752941502_0008/
Kill Command = /usr/local/Cellar/hadoop/2.6.0/bin/hadoop job -kill job_1448752941502_0008
15/11/28 21:44:24 [main]: INFO exec.Task: Kill Command = /usr/local/Cellar/hadoop/2.6.0/bin/hadoop job -kill job_1448752941502_0008