Hive选择计数(*)失败

时间:2017-03-10 10:50:23

标签: hadoop mapreduce hive

我有一个带有JSON记录的配置单元外部表。到目前为止,我已经在表格中插入了9GB的记录。当我尝试select count(*) from abc时,我就是 收到以下错误:

Hadoop job information for Stage-1: number of mappers: 34; number of reducers: 1
2017-03-10 09:02:37,777 Stage-1 map = 0%,  reduce = 0%
2017-03-10 09:03:01,204 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 234.61 sec
2017-03-10 09:03:11,878 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 440.25 sec
2017-03-10 09:03:12,909 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 409.98 sec
2017-03-10 09:03:13,965 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 422.4 sec
2017-03-10 09:03:15,002 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 426.58 sec
2017-03-10 09:03:16,028 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 401.35 sec
2017-03-10 09:03:18,383 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 436.33 sec
2017-03-10 09:03:20,436 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 426.7 sec
2017-03-10 09:03:21,462 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 450.36 sec
2017-03-10 09:03:22,493 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 455.93 sec
2017-03-10 09:03:23,522 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 464.36 sec
2017-03-10 09:03:26,601 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 321.17 sec
MapReduce Total cumulative CPU time: 5 minutes 21 seconds 170 msec
Ended Job = job_1489116838071_0002 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://...........................
Task with the most failures(4):
-----
Task ID:
  task_1489116838071_0002_m_000018

URL:
  http://ip-10-16-37-124:8088/taskdetails.jsp?jobid=job_1489116838071_0002&tipid=task_1489116838071_0002_m_000018
-----
Diagnostic Messages for this Task:
Exception from container-launch.
Container id: container_1489116838071_0002_01_000065
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
        at org.apache.hadoop.util.Shell.run(Shell.java:479)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:744)


Container exited with a non-zero exit code 1


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 34  Reduce: 1   Cumulative CPU: 321.17 sec   HDFS Read: 4112196268 HDFS Write: 0 FAIL

如果表格较小,则count(*)可以正常工作。

hadoop-user-namenode-ip-xxx.log

2017-03-10 09:03:26,997 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073826134_94647{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-6d5c8723-5320-401d-ac04-7cf64fc3f723:NORMAL:10.16.37.61:50010|RBW], ReplicaUC[[DISK]DS-206e72c4-9ca8-4638-83d5-549e08a1dc04:NORMAL:10.16.37.208:50010|RBW], ReplicaUC[[DISK]DS-ccf79a81-25d6-41c2-a133-83cded4ba189:NORMAL:10.16.37.32:50010|RBW]]} for /opt/history/done_intermediate/user/job_1489116838071_0002-1489136553618-user-select+count%28*%29+from+abc%28Stage%2D1%29-1489136605998-15-0-FAILED-default-1489136557174.jhist_tmp
2017-03-10 09:03:27,010 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.16.37.32:50010 is added to blk_1073826134_94647{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-6d5c8723-5320-401d-ac04-7cf64fc3f723:NORMAL:10.16.37.61:50010|RBW], ReplicaUC[[DISK]DS-206e72c4-9ca8-4638-83d5-549e08a1dc04:NORMAL:10.16.37.208:50010|RBW], ReplicaUC[[DISK]DS-ccf79a81-25d6-41c2-a133-83cded4ba189:NORMAL:10.16.37.32:50010|RBW]]} size 0
2017-03-10 09:03:27,011 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.16.37.208:50010 is added to blk_1073826134_94647{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-6d5c8723-5320-401d-ac04-7cf64fc3f723:NORMAL:10.16.37.61:50010|RBW], ReplicaUC[[DISK]DS-206e72c4-9ca8-4638-83d5-549e08a1dc04:NORMAL:10.16.37.208:50010|RBW], ReplicaUC[[DISK]DS-ccf79a81-25d6-41c2-a133-83cded4ba189:NORMAL:10.16.37.32:50010|RBW]]} size 0
2017-03-10 09:03:27,011 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.16.37.61:50010 is added to blk_1073826134_94647{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-6d5c8723-5320-401d-ac04-7cf64fc3f723:NORMAL:10.16.37.61:50010|RBW], ReplicaUC[[DISK]DS-206e72c4-9ca8-4638-83d5-549e08a1dc04:NORMAL:10.16.37.208:50010|RBW], ReplicaUC[[DISK]DS-ccf79a81-25d6-41c2-a133-83cded4ba189:NORMAL:10.16.37.32:50010|RBW]]} size 0
2017-03-10 09:03:27,012 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /opt/history/done_intermediate/user/job_1489116838071_0002-1489136553618-user-select+count%28*%29+from+abc%28Stage%2D1%29-1489136605998-15-0-FAILED-default-1489136557174.jhist_tmp is closed by DFSClient_NONMAPREDUCE_-1618794566_1

mapred-user-historyserver-ip-xxx.log

2017-03-10 09:04:17,871 INFO org.apache.hadoop.mapreduce.jobhistory.JobSummary: jobId=job_1489116838071_0002,submitTime=1489136553618,launchTime=1489136557174,firstMapTaskLaunchTime=1489136559405,firstReduceTaskLaunchTime=1489136603997,finishTime=1489136605998,resourcesPerMap=3072,resourcesPerReduce=3072,numMaps=34,numReduces=1,user=user,queue=default,status=FAILED,mapSlotSeconds=4435,reduceSlotSeconds=2,jobName=select count(*) from abc(Stage-1)

yarn-user-resourcemanager-ip-xxx.log

2017-03-10 09:03:33,532 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1489116838071_0002,name=select count(*) from abc(Stage-1),user=user,queue=default,state=FINISHED,trackingUrl=http://ip-10-16-37-124:8088/proxy/application_1489116838071_0002/,appMasterHost=ip-10-16-37-61,startTime=1489136553618,finishTime=1489136607062,finalStatus=FAILED,memorySeconds=5025083,vcoreSeconds=1579,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=MAPREDUCE
2017-03-10 09:03:33,532 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1489116838071_0002_01_000069 Container Transitioned from ACQUIRED to KILLED
2017-03-10 09:03:33,532 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1489116838071_0002_01_000069 in state: KILLED event:KILL
2017-03-10 09:03:33,532 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=user     OPERATION=AM Released Container TARGET=SchedulerApp     RESULT=SUCCESS  APPID=application_1489116838071_0002    CONTAINERID=container_1489116838071_0002_01_000069
2017-03-10 09:03:33,532 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_1489116838071_0002_01_000069 of capacity <memory:3072, vCores:1> on host ip-10-16-37-61:38344, which currently has 0 containers, <memory:0, vCores:0> used and <memory:40960, vCores:8> available, release resources=true

我已为yarn-site.xmlmapred-site.xmlvcoresmap tasks等设置reduce tasks$ more excel.csv && more test.csv aaa@aaa.fr^Mbbb@bbb.fr^Mccc@ccc.fr^M ddd@ddd.fr eee@eee.fr fff@fff.fr 的值,但可能需要进行一些调整。

0 个答案:

没有答案