Spark工作卡在方法收集上

时间:2015-04-03 23:31:08

标签: apache-spark

当我运行我的Spark Job时,它似乎停留在collect:

enter image description here

我使用命令启动jar:

./spark-1.3.0-bin-hadoop2.4/bin/spark-submit \
  --class com.MyObject \
  --master spark://192.168.192.22:7077 \
  --executor-memory 512M \
  --driver-memory 512M \
  --deploy-mode cluster \
  --total-executor-cores 4 \
  /home/pi/spark-job-jars/spark-job-0.0.1-SNAPSHOT.jar

Jar source : 

package com

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object MyObject {

  def main(args: Array[String]) {

    println("here")


    val sc = new SparkContext(new SparkConf())

    val l = (1 to 10).toList
    val s = sc.parallelize(l)
    val out = s.map(m => m * 3)
    out.collect.foreach(println)

  }

}

Jar pom

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>spark-job</groupId>
    <artifactId>spark-job</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <build>
        <sourceDirectory>src</sourceDirectory>
        <resources>
            <resource>
                <directory>src</directory>
                <excludes>
                    <exclude>**/*.java</exclude>
                </excludes>
            </resource>
        </resources>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <source>1.5</source>
                    <target>1.5</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

    <dependencies>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>1.2.1</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>1.2.1</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

</project>

我可以看到作业正在运行但从未完成: enter image description here

我是如何创建/部署jar以使其无法完成工作的?

1 个答案:

答案 0 :(得分:1)

“或者,如果您的应用程序是从远离工作机器的计算机(例如笔记本电脑本地计算机)提交的,则通常使用群集模式来最小化驱动程序和执行程序之间的网络延迟。请注意,群集模式是目前不支持独立群集,Mesos群集或python应用程序。“

取自: https://spark.apache.org/docs/1.2.0/submitting-applications.html