Question

我在电脑上开始了火花工作。我有4个核心，我为5Go的工人设置了内存。我的主人在另一台机器上，在同一个网络中没有任何工作人员。我的代码看起来像这样：

private void myClass() {
    // configuration of the spark context
    SparkConf conf = new SparkConf().setAppName("myWork").setMaster("spark://myHostIp:7077").set("spark.driver.allowMultipleContexts", "true");
    // creation of the spark context in wich we will run the algorithm
    JavaSparkContext sc = new JavaSparkContext(conf);

    // algorithm
    for(int i = 0; i<200; i++) {
        System.out.println("===============================================================");
        System.out.println("iteration : " + i);
        System.out.println("===============================================================");
        ArrayList<Boolean> list = new ArrayList<Boolean>();
        for(int j = 0; j < 1900; j++){
            list.add(true);
        }
        JavaRDD<Ant> ratings = sc.parallelize(list, 100)
                .map(bool -> new myObj())
                .map(obj -> this.setupObj(obj))
                .map(obj -> this.moveObj(obj))
                .cache();
        int[] stuff = ratings
                .map(obj -> obj.getStuff())
                .reduce((obj1,obj2)->this.mergeStuff(obj1,obj2));
        this.setStuff(tour);

        ArrayList<TabObj> tabObj = ratings
                .map(obj -> this.objToTabObjAsTab(obj))
                .reduce((obj1,obj2)->this.mergeTabObj(obj1,obj2));
        ratings.unpersist(false);

        this.setTabObj(tabObj);

    }

    sc.close(); 
}

当我启动它时，我可以在Spark UI上看到进度，但它确实很慢（我必须设置parrallelize相当高，否则我有超时问题）。所以我认为这是一个CPU瓶颈，但实际上，JVM的CPU消耗非常低（大部分时间它是0％，有时甚至超过5％......）

根据显示器，JVM正在使用3Go内存，我根据SparkUI只缓存了19Mo。

主机也是4cores machin，只能托管主机。主机上的内存较低（4G）。我可以观察到在主机上，CPU消耗是100％（一个完整的核心），我不明白为什么它是那么高......它实际上只需要在另一台机器上向Worker发送分区，对？

我真的不明白为什么cpu的消耗很低。如果有人能够启发我，那就太遗憾了。

谢谢。

Answer 1

确保您已通过群集中的Yarn或mesos提交Spark作业，否则它只能在主节点中运行。
由于你的代码非常简单，完成计算应该非常快，但我建议使用wordcount示例尝试读取几GB的输入源来测试CPU消耗的样子。
请使用＆＃34; local [*]＆＃34; 。 *表示使用您的所有核心进行计算

SparkConf sparkConf = new SparkConf（）。set（＆＃34; spark.driver.host＆＃34;，＆＃34; localhost＆＃34;）。setAppName（＆＃34; unit-testing＆＃34;）。 setMaster（＆＃34;本地[*]＆＃34）; 参考文献：https://spark.apache.org/docs/latest/configuration.html
在spark中有很多东西可能影响CPU和内存使用，例如执行程序和你想分发的每个spark.executor.memory。

运行spark作业时CPU使用率低

1 个答案: