Spark工作无法完成

时间:2015-03-16 10:56:10

标签: hadoop apache-spark oozie

以前我使用命令行和spark-submit命令运行spark job。

sudo -su root spark-submit --executor-memory 512m --num-executors 50 --class com.mycompany.project.SparkHdfsToHBase --master yarn parser-1.0-jar-with-dependencies.jar 0 hdfs://company.com:8020/tmp/testfiles/html_files/* 127.0.0.1 --verbose

一切都像魅力一样。

现在我正在尝试通过oozie工作流程运行spark工作。我已创建工作流并使用python脚本添加shell操作。

Python脚本如下所示:

#!/usr/bin/env python

import subprocess

subprocess.call(["spark-submit", "--class", "com.mycompany.project.SparkHdfsToHBase", "--executor-memory", "512m", "--driver-memory", "512m", "--master", "yarn", "file:///parser-1.0-jar-with-dependencies.jar", "0", "hdfs://company.com:8020/tmp/testfiles/html_files/*", "127.0.0.1", "--verbose"])

我还没有为作业添加任何其他参数,只添加了文件路径。当我提交工作时,在工作浏览器上我的工作被困在5%。

enter image description here

我还没有从日志中获得任何有用的信息:

Syslog:

                2015-03-16 11:46:51,152 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2015-03-16 11:46:51,152 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
2015-03-16 11:46:51,154 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
2015-03-16 11:46:51,157 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
2015-03-16 11:46:51,172 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2015-03-16 11:46:51,354 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2015-03-16 11:46:51,354 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@46ea3050)
2015-03-16 11:46:51,393 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: RM_DELEGATION_TOKEN, Service: 172.24.4.231:8032, Ident: (owner=admin, renewer=oozie mr token, realUser=oozie, issueDate=1426499201955, maxDate=1427104001955, sequenceNumber=6, masterKeyId=2)
2015-03-16 11:46:51,631 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
2015-03-16 11:46:51,633 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2015-03-16 11:46:51,637 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
2015-03-16 11:46:51,640 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
2015-03-16 11:46:51,644 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
2015-03-16 11:46:51,661 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2015-03-16 11:46:52,495 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-03-16 11:46:52,803 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2015-03-16 11:46:52,805 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
2015-03-16 11:46:52,868 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2015-03-16 11:46:52,869 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2015-03-16 11:46:52,870 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2015-03-16 11:46:52,871 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2015-03-16 11:46:52,872 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2015-03-16 11:46:52,873 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2015-03-16 11:46:52,874 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2015-03-16 11:46:52,875 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2015-03-16 11:46:52,967 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2015-03-16 11:46:53,241 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-03-16 11:46:53,303 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-03-16 11:46:53,303 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started
2015-03-16 11:46:53,320 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1426498753001_0001 to jobTokenSecretManager
2015-03-16 11:46:53,447 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1426498753001_0001 because: not enabled;
2015-03-16 11:46:53,473 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1426498753001_0001 = 0. Number of splits = 1
2015-03-16 11:46:53,473 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1426498753001_0001 = 0
2015-03-16 11:46:53,473 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1426498753001_0001Job Transitioned from NEW to INITED
2015-03-16 11:46:53,474 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, non-uberized, multi-container job job_1426498753001_0001.
2015-03-16 11:46:53,532 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-03-16 11:46:53,542 INFO [Socket Reader #1 for port 44690] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 44690
2015-03-16 11:46:53,564 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2015-03-16 11:46:53,565 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-03-16 11:46:53,566 INFO [IPC Server listener on 44690] org.apache.hadoop.ipc.Server: IPC Server listener on 44690: starting
2015-03-16 11:46:53,566 INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at company.com/172.24.4.231:44690
2015-03-16 11:46:53,641 INFO [main] org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2015-03-16 11:46:53,646 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.mapreduce is not defined
2015-03-16 11:46:53,658 INFO [main] org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2015-03-16 11:46:53,663 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context mapreduce
2015-03-16 11:46:53,663 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static
2015-03-16 11:46:53,667 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /mapreduce/*
2015-03-16 11:46:53,667 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2015-03-16 11:46:53,678 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 57328
2015-03-16 11:46:53,678 INFO [main] org.mortbay.log: jetty-6.1.26.cloudera.4
2015-03-16 11:46:53,706 INFO [main] org.mortbay.log: Extract jar:file:/opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/jars/hadoop-yarn-common-2.5.0-cdh5.3.1.jar!/webapps/mapreduce to /tmp/Jetty_0_0_0_0_57328_mapreduce____37uqf6/webapp
2015-03-16 11:46:54,346 INFO [main] org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:57328
2015-03-16 11:46:54,346 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app /mapreduce started at 57328
2015-03-16 11:46:55,005 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2015-03-16 11:46:55,012 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-03-16 11:46:55,012 INFO [Socket Reader #1 for port 60762] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 60762
2015-03-16 11:46:55,021 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-03-16 11:46:55,023 INFO [IPC Server listener on 60762] org.apache.hadoop.ipc.Server: IPC Server listener on 60762: starting
2015-03-16 11:46:55,111 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2015-03-16 11:46:55,111 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2015-03-16 11:46:55,111 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2015-03-16 11:46:55,369 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
2015-03-16 11:46:55,369 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2015-03-16 11:46:55,369 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
2015-03-16 11:46:55,370 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
2015-03-16 11:46:55,375 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
2015-03-16 11:46:55,383 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2015-03-16 11:46:55,388 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at company.com/172.24.4.231:8030
2015-03-16 11:46:55,574 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: <memory:8192, vCores:2>
2015-03-16 11:46:55,574 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.admin
2015-03-16 11:46:55,582 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2015-03-16 11:46:55,584 INFO [main] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
2015-03-16 11:46:55,599 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1426498753001_0001Job Transitioned from INITED to SETUP
2015-03-16 11:46:55,611 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2015-03-16 11:46:55,633 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1426498753001_0001Job Transitioned from SETUP to RUNNING
2015-03-16 11:46:55,714 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1426498753001_0001_m_000000 Task Transitioned from NEW to SCHEDULED
2015-03-16 11:46:55,723 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1426498753001_0001_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2015-03-16 11:46:55,738 INFO [Thread-50] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:1024, vCores:1>
2015-03-16 11:46:55,797 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1426498753001_0001, File: hdfs://company.com:8020/user/admin/.staging/job_1426498753001_0001/job_1426498753001_0001_1.jhist
2015-03-16 11:46:56,584 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2015-03-16 11:46:56,658 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1426498753001_0001: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:7168, vCores:0> knownNMs=1

标准错误:

                Mar 16, 2015 11:47:04 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Mar 16, 2015 11:47:04 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Mar 16, 2015 11:47:04 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Mar 16, 2015 11:47:04 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Mar 16, 2015 11:47:04 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Mar 16, 2015 11:47:05 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Mar 16, 2015 11:47:05 AM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Mar 16, 2015 11:47:05 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"

我很乐意得到任何帮助。谢谢。

0 个答案:

没有答案