Question

我在单节点分发中运行Hive 1.2.1和Hadoop 2.6.0。我在HDFS目录/usr/<name>/test.json中有一个简单的JSON文件。

我创建一个HIVE表并使用以下脚本加载此数据：

CREATE EXTERNAL TABLE json_table (json string) LOCATION '/user/<name>/test.json'

这成功了。现在我从命令行查询表：

hive> select get_json_object(json_table.json,'$.doc') from json_table;

OK
123
456
789
345
987

现在，问题部分：

select count(*) from json_table;

此查询在幕后启动MR作业但从未完成。我必须从命令行手动终止查询。当我从命令行启用它们时，我无法从日志或控制台推断出多少。

简而言之，任何具有SQL函数的查询，查询都会陷入Hive！我错过了SQL函数所需的任何jar，需要放入HDFS吗？

从命令行粘贴堆栈跟踪：

hive> select count(*) from json_table;
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO parse.ParseDriver: Parsing command: select count(*) from json_table
15/11/28 21:44:23 [main]: INFO parse.ParseDriver: Parse Completed   
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=parse start=1448765062471 end=1448765063122 duration=651 from=org.apache.hadoop.hive.ql.Driver>    
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Starting Semantic Analysis    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Completed phase 1 of Semantic Analysis    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Get metadata for source tables    
15/11/28 21:44:23 [main]: INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=json_table    
15/11/28 21:44:23 [main]: INFO HiveMetaStore.audit: ugi=sriramvaradharajan  ip=unknown-ip-addr  cmd=get_table : db=default tbl=json_table       
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Get metadata for subqueries    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Get metadata for destination tables    
15/11/28 21:44:23 [main]: INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Completed getting MetaData in Semantic Analysis    
15/11/28 21:44:23 [main]: INFO parse.BaseSemanticAnalyzer: Not invoking CBO because the statement has too few joins    
15/11/28 21:44:23 [main]: INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10000/.hive-staging_hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Set stats collection dir : hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10000/.hive-staging_hive_2015-11-28_21-44-22_469_6217205930037435432-1/-ext-10002    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for FS(6)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for SEL(5)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for GBY(4)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for RS(3)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for GBY(2)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for SEL(1)
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for TS(0)    
15/11/28 21:44:23 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 3 oldColExprMap: {VALUE._col0=Column[_col0]}    
15/11/28 21:44:23 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 3 newColExprMap: {VALUE._col0=Column[_col0]}    
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>    
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=partition-retrieving start=1448765063731 end=1448765063732 duration=1 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>    
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable    
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans    
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Completed plan generation
15/11/28 21:44:23 [main]: INFO ql.Driver: Semantic Analysis Completed
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1448765063125 end=1448765063749 duration=624 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO exec.ListSinkOperator: Initializing operator OP[7]
15/11/28 21:44:23 [main]: INFO exec.ListSinkOperator: Initialization Done 7 OP
15/11/28 21:44:23 [main]: INFO exec.ListSinkOperator: Operator 7 OP initialized
15/11/28 21:44:23 [main]: INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=compile start=1448765062445 end=1448765063772 duration=1327 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO ql.Driver: Starting command(queryId=sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056): select count(*) from json_table
Query ID = sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056
15/11/28 21:44:23 [main]: INFO ql.Driver: Query ID = sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056
Total jobs = 1
15/11/28 21:44:23 [main]: INFO ql.Driver: Total jobs = 1
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1448765062445 end=1448765063775 duration=1330 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=task.MAPRED.Stage-1 from=org.apache.hadoop.hive.ql.Driver>
Launching Job 1 out of 1
15/11/28 21:44:23 [main]: INFO ql.Driver: Launching Job 1 out of 1
15/11/28 21:44:23 [main]: INFO ql.Driver: Starting task [Stage-1:MAPRED] in serial mode
Number of reduce tasks determined at compile time: 1
15/11/28 21:44:23 [main]: INFO exec.Task: Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
15/11/28 21:44:23 [main]: INFO exec.Task: In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
15/11/28 21:44:23 [main]: INFO exec.Task:   set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
15/11/28 21:44:23 [main]: INFO exec.Task: In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
15/11/28 21:44:23 [main]: INFO exec.Task:   set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
15/11/28 21:44:23 [main]: INFO exec.Task: In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
15/11/28 21:44:23 [main]: INFO exec.Task:   set mapreduce.job.reduces=<number>
15/11/28 21:44:23 [main]: INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO mr.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
15/11/28 21:44:23 [main]: INFO exec.Utilities: Processing alias json_table
15/11/28 21:44:23 [main]: INFO exec.Utilities: Adding input file hdfs://localhost:9000/user/sriramvaradharajan
15/11/28 21:44:23 [main]: INFO exec.Utilities: Content Summary not cached for hdfs://localhost:9000/user/sriramvaradharajan
15/11/28 21:44:23 [main]: INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: INFO exec.Utilities: Serializing MapWork via kryo
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=serializePlan start=1448765063830 end=1448765063973 duration=143 from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: INFO Configuration.deprecation: mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: INFO exec.Utilities: Serializing ReduceWork via kryo
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=serializePlan start=1448765063980 end=1448765063994 duration=14 from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: ERROR mr.ExecDriver: yarn
15/11/28 21:44:24 [main]: INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/11/28 21:44:24 [main]: INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/11/28 21:44:24 [main]: INFO exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10004/be3b1969-bd83-4db2-b51c-c5e7b6596f79/map.xml
15/11/28 21:44:24 [main]: INFO exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10004/be3b1969-bd83-4db2-b51c-c5e7b6596f79/reduce.xml
15/11/28 21:44:24 [main]: WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/11/28 21:44:24 [main]: INFO log.PerfLogger: <PERFLOG method=getSplits from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
15/11/28 21:44:24 [main]: INFO exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10004/be3b1969-bd83-4db2-b51c-c5e7b6596f79/map.xml
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: Total number of paths: 1, launching 1 threads to check non-combinable ones.
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: CombineHiveInputSplit creating pool for hdfs://localhost:9000/user/sriramvaradharajan; using filter path hdfs://localhost:9000/user/sriramvaradharajan
15/11/28 21:44:24 [main]: INFO input.FileInputFormat: Total input paths to process : 2
15/11/28 21:44:24 [main]: INFO input.CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: number of splits 1
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: Number of all splits 1
15/11/28 21:44:24 [main]: INFO log.PerfLogger: </PERFLOG method=getSplits start=1448765064446 end=1448765064479 duration=33 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
15/11/28 21:44:24 [main]: INFO mapreduce.JobSubmitter: number of splits:1
15/11/28 21:44:24 [main]: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1448752941502_0008
15/11/28 21:44:24 [main]: INFO impl.YarnClientImpl: Submitted application application_1448752941502_0008    
15/11/28 21:44:24 [main]: INFO mapreduce.Job: The url to track the job: h--ttp://admins-MacBook-Pro.local:8088/proxy/application_1448752941502_0008/
Starting Job = job_1448752941502_0008, Tracking URL = h--ttp://admins-MacBook-Pro.local:8088/proxy/application_1448752941502_0008/
15/11/28 21:44:24 [main]: INFO exec.Task: Starting Job = job_1448752941502_0008, Tracking URL = h--ttp://admins-MacBook-Pro.local:8088/proxy/application_1448752941502_0008/
Kill Command = /usr/local/Cellar/hadoop/2.6.0/bin/hadoop job  -kill job_1448752941502_0008
15/11/28 21:44:24 [main]: INFO exec.Task: Kill Command = /usr/local/Cellar/hadoop/2.6.0/bin/hadoop job  -kill job_1448752941502_0008

Hive SQL函数在执行期间卡住了

0 个答案: