Question

我需要创建一个Java程序，该程序将python脚本（使用PySpark）提交到Yarn集群。现在，我看到使用SparkLauncher与使用YarnClient相同，因为它使用了内置的Yarn Client（编写我自己的Yarn Client非常疯狂，我尝试过，要处理的东西太多了）。所以我写道：

 public static void main(String[] args) throws Exception {
    String SPARK_HOME = System.getProperty("SPARK_HOME");
    submit(SPARK_HOME, args);
}

static void submit(String SPARK_HOME, String[] args) throws Exception {
    String[] arguments = new String[]{
            // application name
            "--name",
            "SparkPi-Python",

            "--class",
            "org.apache.spark.deploy.PythonRunner",

            "--py-files",
            SPARK_HOME + "/python/lib/pyspark.zip,"+ SPARK_HOME +"/python/lib/py4j-0.9-src.zip",

            // Python Program
            "--primary-py-file",
            "/home/lorenzo/script.py",

            // number of executors
            "--num-executors",
            "2",

            // driver memory
            "--driver-memory",
            "512m",

            // executor memory
            "--executor-memory",
            "512m",

            // executor cores
            "--executor-cores",
            "2",

            "--queue",
            "default",

            // argument 1 to my Spark program
            "--arg",
            null,
    };
    System.setProperty("SPARK_YARN_MODE", "true");
    System.out.println(SPARK_HOME);
    SparkLauncher sparkLauncher = new SparkLauncher();
    sparkLauncher.setSparkHome("/usr/hdp/current/spark2-client");
    sparkLauncher.setAppResource("/home/lorenzo/script.py");
    sparkLauncher.setMaster("yarn");
    sparkLauncher.setDeployMode("cluster");
    sparkLauncher.setVerbose(true);
    sparkLauncher.launch().waitFor();
}

当我从集群中的一台机器上运行此Jar时，什么也没有发生……没有错误，没有日志，没有纱线容器……什么也没有……如果我尝试将println放入此代码中，它会打印println。

我配置错误？
如果我想从其他机器上运行此JAR，应该在哪里以及如何声明IP？

使用Java将PySpark提交到Yarn群集

0 个答案: