我使用Maven运行一个独立Spark应用程序,但是我遇到了一些错误。我的集群中有一个主服务器和三个从服务器,Spark版本是0.9.1。 pom.xml如下。
<groupId>spark</groupId>
<artifactId>testspark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>testspark</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<repositories>
<repository>
<id>Akka repository</id>
<url>http://repo.akka.io/releases</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>0.9.1</version>
</dependency>
</dependencies>
我的Spark应用程序如下。
public static void main(String[] args) {
String sparkHome = System.getenv("SPARK_HOME");
System.out.println(sparkHome);
String logFile = "/usr/Java/spark-0.9.1-bin-hadoop1/README.md";//spark:
SparkConf conf = new SparkConf();
conf.setMaster("spark://192.168.23.123:7077")
.setAppName("Simple App")
.setSparkHome(System.getenv("SPARK_HOME"))
.setJars(new String[] { "target/testspark-0.0.1-SNAPSHOT.jar" })
.set("spark.executor.memory", "1g");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("a");
}
}).count();
long numBs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("b");
}
}).count();
System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
sc.stop();
}
我的主人是192.168.23.123,运行后错误显示如下。
Caused by: org.apache.maven.plugin.MojoExecutionException: An exception occured while executing the Java class. null
at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:345)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:133)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
... 19 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 4 times (most recent failure: Exception failure: java.lang.OutOfMemoryError: Java heap space)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
至于OutOfMemoryError,我设置句子.set(&#34; spark.executor.memory&#34;,&#34; 1g&#34;),但它不起作用。有什么建议吗?
答案 0 :(得分:-1)
允许Java应用程序仅使用有限数量的内存。在应用程序启动期间指定此限制。为了使事情变得更复杂,Java内存被分成两个不同的区域。这些区域称为堆空间和permgen。
当您尝试将更多数据添加到内存中的堆空间区域时,将触发您遇到的“java.lang.OutOfMemoryError:Java堆空间”错误,但此数据的大小大于JVM可容纳的数据在Java堆空间中。
为了解决这个问题,通常通过在JVM启动参数中添加(或增加(如果存在)-Xmx属性的值来增加堆大小。