我使用head()
函数获取数据集中聚合后的第一行(这是唯一的行),如下所示:
numeratorDataset = vector0.join(vector1, "wikibase_item").map(new MapFunction<Row, TFIDFComponent>() {
private static final long serialVersionUID = 1L;
@Override
public TFIDFComponent call(Row row) throws Exception {
// TODO Auto-generated method stub
return new TFIDFComponent(row.getString(0), row.getDouble(1)*row.getDouble(2));
}
}, Encoders.bean(TFIDFComponent.class));
numerator = numeratorDataset.agg(org.apache.spark.sql.functions.sum(numeratorDataset.col("weight"))).head().getDouble(0);
我发现它虽然是一排,但需要花费太多时间。知道为什么吗?我该如何解决?