我的应用程序需要使用spark计算一系列值,并尝试使其元数据驱动。
[
{
key : "myKeyName",
logic : "scala script"
}
...
]
我有一个类似于上面的json,它将与" app.jar"一起提交。到Spark。在spark()的主要工作中,我希望加载这个json并执行"逻辑"在spark中编写脚本并获取密钥的值。我认为SparkContext.submitJob()是我想要的,但我不确定。仍在网上寻找解决方案。任何帮助都非常感谢,提前感谢。
捆绑的jar通过SparkLauncher提交给spark:
final SparkLauncher launcher = new SparkLauncher()
.setAppResource("path/to/app.jar")
.setMainClass("the.main.class")
.setMaster("spark.master")
.setConf(SparkLauncher.DRIVER_MEMORY, "3g");
//add the other dependent jar files
launcher.startApplication();
PS:Spark应用程序在Docker中实现为服务。
答案 0 :(得分:0)
自己想出来。
//...
import scala.tools.nsc.interpreter.IMain;
import javax.script.ScriptContext;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
//...
private void scalaJob(SparkSession sparkSession, Dataset<Row> someData){
ScriptEngine e = new ScriptEngineManager().getEngineByName("scala");
//tell scala to use the classpath same as java
((IMain)e).settings().classpath().append(System.getProperty("java.class.path"));
//passing on some foo and bar
e.getContext().setAttribute("sparkSession",sparkSession, ScriptContext.ENGINE_SCOPE);
e.getContext().setAttribute("someData",someData, ScriptContext.ENGINE_SCOPE);
try {
//hello world
String script = "object HelloWorld {\n";
script += "def main(args: Array[String]): Unit = {\n";
script += "println(\"Hello, world!\")\n";
script += "}\n";
script += "}\n";
script += "HelloWorld.main(Array())";
e.eval(script);
//some serious work
script = "import org.apache.spark.sql.SparkSession\n";
script += "import org.apache.spark.sql.Dataset\n"
script += "import org.apache.spark.sql.Row\n";
script += "val processedData = someData.asInstanceOf[Dataset[Row]]\n";
script += "processedData.show(false)\n"
script += "processedData\n";
//getting back the result of serious work
Dataset<Row> ds = (Dataset<Row>) e.eval(script);
ds.show();
} catch (ScriptException ex) {
ex.printStackTrace();
System.exit(1);
}
}
从json元数据加载script
。
PS:这只是一个例子,而不是生产代码。