java - Spark head（）对于具有单行

我使用head()函数获取数据集中聚合后的第一行（这是唯一的行），如下所示：

numeratorDataset = vector0.join(vector1, "wikibase_item").map(new MapFunction<Row, TFIDFComponent>() {
    private static final long serialVersionUID = 1L;

    @Override
    public TFIDFComponent call(Row row) throws Exception {
        // TODO Auto-generated method stub
        return new TFIDFComponent(row.getString(0), row.getDouble(1)*row.getDouble(2));
    }
}, Encoders.bean(TFIDFComponent.class));

numerator = numeratorDataset.agg(org.apache.spark.sql.functions.sum(numeratorDataset.col("weight"))).head().getDouble(0);

我发现它虽然是一排，但需要花费太多时间。知道为什么吗？我该如何解决？

Spark head（）对于具有单行

0 个答案: