我第一次使用Winutils.exe从Eclipse触发Java的Spark作业。从Eclipse提交Spark作业时,我得到的库目录'<> \ assembly \ target \ scala-2.11 \ jars'不存在;确保已构建Spark。 完整日志
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/09 17:07:16 INFO SparkContext: Running Spark version 2.3.2
18/11/09 17:07:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/09 17:07:17 WARN SparkConf: spark.master yarn-client is deprecated in Spark 2.0+, please instead use "yarn" with specified deploy mode.
18/11/09 17:07:17 INFO SparkContext: Submitted application: test-spark-job
18/11/09 17:07:17 INFO SecurityManager: Changing view acls to: PARAY
18/11/09 17:07:17 INFO SecurityManager: Changing modify acls to: PARAY
18/11/09 17:07:17 INFO SecurityManager: Changing view acls groups to:
18/11/09 17:07:17 INFO SecurityManager: Changing modify acls groups to:
18/11/09 17:07:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(PARAY); groups with view permissions: Set(); users with modify permissions: Set(PARAY); groups with modify permissions: Set()
18/11/09 17:07:20 INFO Utils: Successfully started service 'sparkDriver' on port 56603.
18/11/09 17:07:20 INFO SparkEnv: Registering MapOutputTracker
18/11/09 17:07:20 INFO SparkEnv: Registering BlockManagerMaster
18/11/09 17:07:20 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/11/09 17:07:20 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/11/09 17:07:20 INFO DiskBlockManager: Created local directory at C:\Users\PARAY\AppData\Local\Temp\blockmgr-736b1a14-ff56-4fea-b5db-5ffac37be31f
18/11/09 17:07:20 INFO MemoryStore: MemoryStore started with capacity 873.0 MB
18/11/09 17:07:20 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/09 17:07:21 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/09 17:07:21 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://PARAY-IN.in.oracle.com:4040
18/11/09 17:07:22 INFO RMProxy: Connecting to ResourceManager at whf00aql/10.184.155.224:8032
18/11/09 17:07:23 INFO Client: Requesting a new application from cluster with 3 NodeManagers
18/11/09 17:07:23 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2312 MB per container)
18/11/09 17:07:23 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
18/11/09 17:07:23 INFO Client: Setting up container launch context for our AM
18/11/09 17:07:23 INFO Client: Setting up the launch environment for our AM container
18/11/09 17:07:23 INFO Client: Preparing resources for our AM container
18/11/09 17:07:23 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/09 17:07:23 INFO Client: Deleted staging directory hdfs://whf00aql/user/PARAY/.sparkStaging/application_1540636880940_0004
18/11/09 17:07:23 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Library directory '<<SPARK_HOME>>\assembly\target\scala-2.11\jars' does not exist; make sure Spark is built.
at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:342)
at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:556)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:876)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:173)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at com.ofss.ng.poc.test.util.TestSession.testSession(TestSession.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
at org.junit.runner.JUnitCore.run(JUnitCore.java:105)
at org.junit.runner.JUnitCore.runClasses(JUnitCore.java:62)
at org.junit.runner.JUnitCore.runClasses(JUnitCore.java:49)
at com.ofss.ng.poc.test.util.TestRunner.main(TestRunner.java:17)
我选择了SPARK2包裹文件夹,并在本地Windows中将其保留为SPARK_HOME。但是我不去看大会文件夹。 以下是代码。
System.setProperty("SPARK_YARN_MODE", "true");
SparkConf sparkConfiguration = new SparkConf();
sparkConfiguration.setMaster("yarn-client");
sparkConfiguration.setAppName("test-spark-job");
//sparkConfiguration.setJars(new String[] { "C:\\Work\\workspaces\\SparkJvGradlePOC\\build\\libs" });
sparkConfiguration.set("spark.hadoop.fs.defaultFS", "hdfs://whf00aql");
sparkConfiguration.set("spark.hadoop.dfs.nameservices", "whf00aql:8020");
sparkConfiguration.set("spark.hadoop.yarn.resourcemanager.hostname", "whf00aql");
sparkConfiguration.set("spark.hadoop.yarn.resourcemanager.address", "whf00aql:8032");
sparkConfiguration.set("spark.hadoop.yarn.application.classpath",
"$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,"
+ "$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,"
+ "$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,"
+ "$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*");
SparkContext sparkContext = new SparkContext(sparkConfiguration);
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkContext);
String str = "Sesison Ok";
try{
SparkSession sp= SessionSingleton.getSession("TestSession");
}
catch(Throwable e)
{
str="Session failed";
}
SessionSingleton代码
public class SessionSingleton {
private static SparkSession sp=null;
public static SparkSession getSession(String SessionCode){
if (String.valueOf(sp).equalsIgnoreCase("null"))
{
System.out.println("creating sparksession");
SparkSession spark = SparkSession
.builder()
.appName(SessionCode)
// .config("spark.some.config.option", "some-value")
//.master("use spark-submit")
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "target/spark-warehouse")
.getOrCreate();
sp=spark;
return sp;
}
else
{
return sp;
}
}
答案 0 :(得分:0)
如前所述-spark.yarn.jars和spark.yarn.archive都没有设置!您必须上传spark lib jars并设置此配置。
步骤-
上传JAR并配置JAR位置:
Manually upload the Spark assembly JAR file to HDFS:
$ hdfs dfs -mkdir -p /user/spark/share/lib
$ hdfs dfs -put SPARK_HOME/assembly/lib/*.jar /user/spark/share/lib/
将spark.yarn.jar设置为HDFS路径:
sparkConfiguration.set("spark.yarn.jars", "hdfs://namenode:8020/user/spark/shar/lib/*.jar");