当我运行我的Spark Job时,它似乎停留在collect:
我使用命令启动jar:
./spark-1.3.0-bin-hadoop2.4/bin/spark-submit \
--class com.MyObject \
--master spark://192.168.192.22:7077 \
--executor-memory 512M \
--driver-memory 512M \
--deploy-mode cluster \
--total-executor-cores 4 \
/home/pi/spark-job-jars/spark-job-0.0.1-SNAPSHOT.jar
Jar source :
package com
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object MyObject {
def main(args: Array[String]) {
println("here")
val sc = new SparkContext(new SparkConf())
val l = (1 to 10).toList
val s = sc.parallelize(l)
val out = s.map(m => m * 3)
out.collect.foreach(println)
}
}
Jar pom
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>spark-job</groupId>
<artifactId>spark-job</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<build>
<sourceDirectory>src</sourceDirectory>
<resources>
<resource>
<directory>src</directory>
<excludes>
<exclude>**/*.java</exclude>
</excludes>
</resource>
</resources>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.5</source>
<target>1.5</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.2.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.2.1</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
我可以看到作业正在运行但从未完成:
我是如何创建/部署jar以使其无法完成工作的?
答案 0 :(得分:1)
“或者,如果您的应用程序是从远离工作机器的计算机(例如笔记本电脑本地计算机)提交的,则通常使用群集模式来最小化驱动程序和执行程序之间的网络延迟。请注意,群集模式是目前不支持独立群集,Mesos群集或python应用程序。“
取自: https://spark.apache.org/docs/1.2.0/submitting-applications.html