使用Apache Toree,可以在Spark上执行任意表达式。假设我们想执行一些SQL查询,例如:sqlContext.sql(..)
是否有可能获得此类SQL查询的进度(如在Zeppelin中)?也许Toree可以提供一些查询指标(如X tasks from N are done
)?
答案 0 :(得分:1)
Apache Zeppelin使用的方式是通过sc.dagScheduler。
如果无法直接访问SparkContext,REST API应该是更好的选择。
package org.apache.zeppelin.spark
class SparkInterpreter {
@Override
public int getProgress(InterpreterContext context) {
String jobGroup = getJobGroup(context);
int completedTasks = 0;
int totalTasks = 0;
DAGScheduler scheduler = sc.dagScheduler();
if (scheduler == null) {
return 0;
}
HashSet<ActiveJob> jobs = scheduler.activeJobs();
if (jobs == null || jobs.size() == 0) {
return 0;
}
Iterator<ActiveJob> it = jobs.iterator();
while (it.hasNext()) {
ActiveJob job = it.next();
String g = (String) job.properties().get("spark.jobGroup.id");
if (jobGroup.equals(g)) {
int[] progressInfo = null;
try {
Object finalStage = job.getClass().getMethod("finalStage").invoke(job);
if (sparkVersion.getProgress1_0()) {
progressInfo = getProgressFromStage_1_0x(sparkListener, finalStage);
} else {
progressInfo = getProgressFromStage_1_1x(sparkListener, finalStage);
}
} catch (IllegalAccessException | IllegalArgumentException
| InvocationTargetException | NoSuchMethodException
| SecurityException e) {
logger.error("Can't get progress info", e);
return 0;
}
totalTasks += progressInfo[0];
completedTasks += progressInfo[1];
}
}
if (totalTasks == 0) {
return 0;
}
return completedTasks * 100 / totalTasks;
}
}