从Eclipse提交Spark 2,3,2作业

时间:2018-11-09 12:05:55

标签: apache-spark

我第一次使用Winutils.exe从Eclipse触发Java的Spark作业。从Eclipse提交Spark作业时,我得到的库目录'<> \ assembly \ target \ scala-2.11 \ jars'不存在;确保已构建Spark。 完整日志

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/09 17:07:16 INFO SparkContext: Running Spark version 2.3.2
18/11/09 17:07:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/09 17:07:17 WARN SparkConf: spark.master yarn-client is deprecated in Spark 2.0+, please instead use "yarn" with specified deploy mode.
18/11/09 17:07:17 INFO SparkContext: Submitted application: test-spark-job
18/11/09 17:07:17 INFO SecurityManager: Changing view acls to: PARAY
18/11/09 17:07:17 INFO SecurityManager: Changing modify acls to: PARAY
18/11/09 17:07:17 INFO SecurityManager: Changing view acls groups to: 
18/11/09 17:07:17 INFO SecurityManager: Changing modify acls groups to: 
18/11/09 17:07:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(PARAY); groups with view permissions: Set(); users  with modify permissions: Set(PARAY); groups with modify permissions: Set()
18/11/09 17:07:20 INFO Utils: Successfully started service 'sparkDriver' on port 56603.
18/11/09 17:07:20 INFO SparkEnv: Registering MapOutputTracker
18/11/09 17:07:20 INFO SparkEnv: Registering BlockManagerMaster
18/11/09 17:07:20 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/11/09 17:07:20 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/11/09 17:07:20 INFO DiskBlockManager: Created local directory at C:\Users\PARAY\AppData\Local\Temp\blockmgr-736b1a14-ff56-4fea-b5db-5ffac37be31f
18/11/09 17:07:20 INFO MemoryStore: MemoryStore started with capacity 873.0 MB
18/11/09 17:07:20 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/09 17:07:21 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/09 17:07:21 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://PARAY-IN.in.oracle.com:4040
18/11/09 17:07:22 INFO RMProxy: Connecting to ResourceManager at whf00aql/10.184.155.224:8032
18/11/09 17:07:23 INFO Client: Requesting a new application from cluster with 3 NodeManagers
18/11/09 17:07:23 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2312 MB per container)
18/11/09 17:07:23 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
18/11/09 17:07:23 INFO Client: Setting up container launch context for our AM
18/11/09 17:07:23 INFO Client: Setting up the launch environment for our AM container
18/11/09 17:07:23 INFO Client: Preparing resources for our AM container
18/11/09 17:07:23 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/09 17:07:23 INFO Client: Deleted staging directory hdfs://whf00aql/user/PARAY/.sparkStaging/application_1540636880940_0004
18/11/09 17:07:23 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Library directory '<<SPARK_HOME>>\assembly\target\scala-2.11\jars' does not exist; make sure Spark is built.
    at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
    at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:342)
    at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:556)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:876)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:173)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    at com.ofss.ng.poc.test.util.TestSession.testSession(TestSession.java:33)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runners.Suite.runChild(Suite.java:128)
    at org.junit.runners.Suite.runChild(Suite.java:27)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:105)
    at org.junit.runner.JUnitCore.runClasses(JUnitCore.java:62)
    at org.junit.runner.JUnitCore.runClasses(JUnitCore.java:49)
    at com.ofss.ng.poc.test.util.TestRunner.main(TestRunner.java:17)

我选择了SPARK2包裹文件夹,并在本地Windows中将其保留为SPARK_HOME。但是我不去看大会文件夹。 以下是代码。

 System.setProperty("SPARK_YARN_MODE", "true");
   SparkConf sparkConfiguration = new SparkConf();
   sparkConfiguration.setMaster("yarn-client");
   sparkConfiguration.setAppName("test-spark-job");
   //sparkConfiguration.setJars(new String[] { "C:\\Work\\workspaces\\SparkJvGradlePOC\\build\\libs" });

   sparkConfiguration.set("spark.hadoop.fs.defaultFS", "hdfs://whf00aql");
   sparkConfiguration.set("spark.hadoop.dfs.nameservices", "whf00aql:8020");
   sparkConfiguration.set("spark.hadoop.yarn.resourcemanager.hostname", "whf00aql");
   sparkConfiguration.set("spark.hadoop.yarn.resourcemanager.address", "whf00aql:8032");
   sparkConfiguration.set("spark.hadoop.yarn.application.classpath",
           "$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,"
                   + "$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,"
                   + "$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,"
                   + "$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*");

SparkContext sparkContext = new SparkContext(sparkConfiguration);
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkContext);
   String str = "Sesison Ok";
  try{
   SparkSession sp= SessionSingleton.getSession("TestSession");
  }
  catch(Throwable e)
  {
      str="Session failed";
  }

SessionSingleton代码

public class SessionSingleton {

private static SparkSession sp=null;

public static SparkSession getSession(String SessionCode){

    if (String.valueOf(sp).equalsIgnoreCase("null"))
        {
            System.out.println("creating sparksession");
            SparkSession spark = SparkSession
                              .builder()
                  .appName(SessionCode)
                 // .config("spark.some.config.option", "some-value")
                  //.master("use spark-submit")
                  .enableHiveSupport()
                  .config("spark.sql.warehouse.dir", "target/spark-warehouse")
                  .getOrCreate();
        sp=spark;
        return sp;
        }
    else
    {
        return sp;
    }
}

1 个答案:

答案 0 :(得分:0)

如前所述-spark.yarn.jars和spark.yarn.archive都没有设置!您必须上传spark lib jars并设置此配置。

步骤-

上传JAR并配置JAR位置:

Manually upload the Spark assembly JAR file to HDFS:

$ hdfs dfs -mkdir -p /user/spark/share/lib
$ hdfs dfs -put SPARK_HOME/assembly/lib/*.jar /user/spark/share/lib/

将spark.yarn.jar设置为HDFS路径:

sparkConfiguration.set("spark.yarn.jars", "hdfs://namenode:8020/user/spark/shar/lib/*.jar");