使用Jobserver Spark-2.0-preview和Apache Spark 2.1.1
当我检查spark UI时,每个查询的执行时间不大于1秒,但是我在10秒后收到来自jobserver的响应等等。 我正在查询镶木地板文件,我正在使用java。
这是我的代码示例,基本上我使用jobserver提供的SparkContext创建了sparksession。
public class SOverflow extends VIQ_SparkJob {
private static final String CUBE_USERS_V_DE = "spark_cube_users_v_de";
private static final String CUBE_USERS_V_EN = "spark_cube_users_v_en";
@Override
public Object runJob(SparkContext jsc, Config jobConfig) {
Long startTime = System.currentTimeMillis();
String query = jobConfig.getString("query");
try {
sparkSession = SparkSession.builder()
.sparkContext(jsc)
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "file:///value_iq/spark-warehouse/")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.kryoserializer.buffer", "8m")
.getOrCreate();
Class<?>[] classes = new Class<?>[1];
classes[0] = SOverflow.class;
sparkSession.sparkContext().conf().registerKryoClasses(classes);
getDataFrameFromParket("spark_cube_users_v" + "/tenant_id=" + getTENANT_ID()).createOrReplaceTempView("spark_cube_users_v");
getDataFrameFromMySQL("value_iq", "lookup_values").filter(new Column("lookup_domain").equalTo("customer_type")).createOrReplaceTempView("lookup_values");
sparkSession.sql(jdbcMySQL.getViewQuery(CUBE_USERS_V_DE)).createOrReplaceTempView("cube_users_v_de");
sparkSession.sql(jdbcMySQL.getViewQuery(CUBE_USERS_V_EN)).createOrReplaceTempView("cube_users_v_en");
List<String> list = sparkSession.sql(query).toJSON().takeAsList(10000);
String result = "[";
for (int i = 0; i < list.size(); i++) {
result += (i == 0 ? "" : ",") + list.get(i);
}
result += "]";
log(MyLog.INFORMATION, "runJob()", "Tooks " + ((System.currentTimeMillis() - startTime) / 1000) + "s to execute query: " + query + ".");
return result;
} catch (Exception e) {
}
return null;
}
}
我的问题是:为什么我在spark UI中获得那些长时间的响应,查询真的很快?