Spark Jobserver太慢了,不尊重Apache Spark Queries

时间:2017-05-05 11:24:02

标签: java apache-spark spark-jobserver

使用Jobserver Spark-2.0-preview和Apache Spark 2.1.1

当我检查spark UI时,每个查询的执行时间不大于1秒,但是我在10秒后收到来自jobserver的响应等等。 Spark Completed Queries 我正在查询镶木地板文件,我正在使用java。

这是我的代码示例,基本上我使用jobserver提供的SparkContext创建了sparksession。

public class SOverflow extends VIQ_SparkJob {

private static final String CUBE_USERS_V_DE = "spark_cube_users_v_de";
private static final String CUBE_USERS_V_EN = "spark_cube_users_v_en";

@Override
public Object runJob(SparkContext jsc, Config jobConfig) {
    Long startTime = System.currentTimeMillis();
    String query = jobConfig.getString("query");
    try {
        sparkSession = SparkSession.builder()
                .sparkContext(jsc)
                .enableHiveSupport()
                .config("spark.sql.warehouse.dir", "file:///value_iq/spark-warehouse/")
                .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                .config("spark.kryoserializer.buffer", "8m")
                .getOrCreate();
        Class<?>[] classes = new Class<?>[1];
        classes[0] = SOverflow.class;
        sparkSession.sparkContext().conf().registerKryoClasses(classes);

        getDataFrameFromParket("spark_cube_users_v" + "/tenant_id=" + getTENANT_ID()).createOrReplaceTempView("spark_cube_users_v");

        getDataFrameFromMySQL("value_iq", "lookup_values").filter(new Column("lookup_domain").equalTo("customer_type")).createOrReplaceTempView("lookup_values");

        sparkSession.sql(jdbcMySQL.getViewQuery(CUBE_USERS_V_DE)).createOrReplaceTempView("cube_users_v_de");
        sparkSession.sql(jdbcMySQL.getViewQuery(CUBE_USERS_V_EN)).createOrReplaceTempView("cube_users_v_en");

        List<String> list = sparkSession.sql(query).toJSON().takeAsList(10000);
        String result = "[";
        for (int i = 0; i < list.size(); i++) {
            result += (i == 0 ? "" : ",") + list.get(i);
        }
        result += "]";
        log(MyLog.INFORMATION, "runJob()", "Tooks " + ((System.currentTimeMillis() - startTime) / 1000) + "s to execute query: " + query + ".");
        return result;
    } catch (Exception e) {

    }
    return null;
}
}

我的问题是:为什么我在spark UI中获得那些长时间的响应,查询真的很快?

0 个答案:

没有答案