我需要创建一个Java程序,该程序将python脚本(使用PySpark)提交到Yarn集群。 现在,我看到使用SparkLauncher与使用YarnClient相同,因为它使用了内置的Yarn Client(编写我自己的Yarn Client非常疯狂,我尝试过,要处理的东西太多了)。 所以我写道:
public static void main(String[] args) throws Exception {
String SPARK_HOME = System.getProperty("SPARK_HOME");
submit(SPARK_HOME, args);
}
static void submit(String SPARK_HOME, String[] args) throws Exception {
String[] arguments = new String[]{
// application name
"--name",
"SparkPi-Python",
"--class",
"org.apache.spark.deploy.PythonRunner",
"--py-files",
SPARK_HOME + "/python/lib/pyspark.zip,"+ SPARK_HOME +"/python/lib/py4j-0.9-src.zip",
// Python Program
"--primary-py-file",
"/home/lorenzo/script.py",
// number of executors
"--num-executors",
"2",
// driver memory
"--driver-memory",
"512m",
// executor memory
"--executor-memory",
"512m",
// executor cores
"--executor-cores",
"2",
"--queue",
"default",
// argument 1 to my Spark program
"--arg",
null,
};
System.setProperty("SPARK_YARN_MODE", "true");
System.out.println(SPARK_HOME);
SparkLauncher sparkLauncher = new SparkLauncher();
sparkLauncher.setSparkHome("/usr/hdp/current/spark2-client");
sparkLauncher.setAppResource("/home/lorenzo/script.py");
sparkLauncher.setMaster("yarn");
sparkLauncher.setDeployMode("cluster");
sparkLauncher.setVerbose(true);
sparkLauncher.launch().waitFor();
}
当我从集群中的一台机器上运行此Jar时,什么也没有发生……没有错误,没有日志,没有纱线容器……什么也没有……如果我尝试将println放入此代码中,它会打印println。