Java 8 Stream:使用多个收集器进行分组

时间:2015-08-18 11:54:56

标签: java java-8 java-stream

我想通过一个分类器使用Java 8 Stream和Group,但是有多个Collector函数。因此,在分组时,例如计算一个字段(或可能是另一个字段)的平均值和总和。

我尝试用一​​个例子来简化这一点:

public void test() {
    List<Person> persons = new ArrayList<>();
    persons.add(new Person("Person One", 1, 18));
    persons.add(new Person("Person Two", 1, 20));
    persons.add(new Person("Person Three", 1, 30));
    persons.add(new Person("Person Four", 2, 30));
    persons.add(new Person("Person Five", 2, 29));
    persons.add(new Person("Person Six", 3, 18));

    Map<Integer, Data> result = persons.stream().collect(
            groupingBy(person -> person.group, multiCollector)
    );
}

class Person {
    String name;
    int group;
    int age;

    // Contructor, getter and setter
}

class Data {
    long average;
    long sum;

    public Data(long average, long sum) {
        this.average = average;
        this.sum = sum;
    }

    // Getter and setter
}

结果应该是一个与分组结果相关联的地图,如

1 => Data(average(18, 20, 30), sum(18, 20, 30))
2 => Data(average(30, 29), sum(30, 29))
3 => ....

这对于像“Collectors.counting()”这样的函数来说非常合适,但我喜欢链接多个(理想情况下是List的无限)。

List<Collector<Person, ?, ?>>

是否可以做这样的事情?

5 个答案:

答案 0 :(得分:15)

对于求和和求平均的具体问题,请使用collectingAndThensummarizingDouble

Map<Integer, Data> result = persons.stream().collect(
        groupingBy(Person::getGroup, 
                collectingAndThen(summarizingDouble(Person::getAge), 
                        dss -> new Data((long)dss.getAverage(), (long)dss.getSum()))));

对于更通用的问题(收集关于你人员的各种事情),你可以创建一个像这样的复杂收集器:

// Individual collectors are defined here
List<Collector<Person, ?, ?>> collectors = Arrays.asList(
        Collectors.averagingInt(Person::getAge),
        Collectors.summingInt(Person::getAge));

@SuppressWarnings("unchecked")
Collector<Person, List<Object>, List<Object>> complexCollector = Collector.of(
    () -> collectors.stream().map(Collector::supplier)
        .map(Supplier::get).collect(toList()),
    (list, e) -> IntStream.range(0, collectors.size()).forEach(
        i -> ((BiConsumer<Object, Person>) collectors.get(i).accumulator()).accept(list.get(i), e)),
    (l1, l2) -> {
        IntStream.range(0, collectors.size()).forEach(
            i -> l1.set(i, ((BinaryOperator<Object>) collectors.get(i).combiner()).apply(l1.get(i), l2.get(i))));
        return l1;
    },
    list -> {
        IntStream.range(0, collectors.size()).forEach(
            i -> list.set(i, ((Function<Object, Object>)collectors.get(i).finisher()).apply(list.get(i))));
        return list;
    });

Map<Integer, List<Object>> result = persons.stream().collect(
        groupingBy(Person::getGroup, complexCollector)); 

映射值是列表,其中第一个元素是应用第一个收集器的结果,依此类推。您可以使用Collectors.collectingAndThen(complexCollector, list -> ...)添加自定义修整器步骤,以将此列表转换为更合适的名称。

答案 1 :(得分:4)

通过使用地图作为输出类型,可以有一个潜在的无限减速器列表,每个减速器都会生成自己的统计数据并将其添加到地图中。

public static <K, V> Map<K, V> addMap(Map<K, V> map, K k, V v) {
    Map<K, V> mapout = new HashMap<K, V>();
    mapout.putAll(map);
    mapout.put(k, v);
    return mapout;
}

...

    List<Person> persons = new ArrayList<>();
    persons.add(new Person("Person One", 1, 18));
    persons.add(new Person("Person Two", 1, 20));
    persons.add(new Person("Person Three", 1, 30));
    persons.add(new Person("Person Four", 2, 30));
    persons.add(new Person("Person Five", 2, 29));
    persons.add(new Person("Person Six", 3, 18));

    List<BiFunction<Map<String, Integer>, Person, Map<String, Integer>>> listOfReducers = new ArrayList<>();

    listOfReducers.add((m, p) -> addMap(m, "Count", Optional.ofNullable(m.get("Count")).orElse(0) + 1));
    listOfReducers.add((m, p) -> addMap(m, "Sum", Optional.ofNullable(m.get("Sum")).orElse(0) + p.i1));

    BiFunction<Map<String, Integer>, Person, Map<String, Integer>> applyList
            = (mapin, p) -> {
                Map<String, Integer> mapout = mapin;
                for (BiFunction<Map<String, Integer>, Person, Map<String, Integer>> f : listOfReducers) {
                    mapout = f.apply(mapout, p);
                }
                return mapout;
            };
    BinaryOperator<Map<String, Integer>> combineMaps
            = (map1, map2) -> {
                Map<String, Integer> mapout = new HashMap<>();
                mapout.putAll(map1);
                mapout.putAll(map2);
                return mapout;
            };
    Map<String, Integer> map
            = persons
            .stream()
            .reduce(new HashMap<String, Integer>(),
                    applyList, combineMaps);
    System.out.println("map = " + map);

制作:

map = {Sum=10, Count=6}

答案 2 :(得分:3)

你可以链接它们,

收集器只能生成一个对象,但此对象可以包含多个值。例如,您可以返回一个Map,其中地图为您要返回的每个收集器都有一个条目。

您可以使用Collectors.of(HashMap::new, accumulator, combiner);

您的accumulator会有一个收集者地图,其中所生成的地图的键与收集者的名称相匹配。当并行执行时,组合器需要一种方法来组合多个结果esp。

通常,内置收集器使用数据类型来获得复杂的结果。

来自收藏家

public static <T>
Collector<T, ?, DoubleSummaryStatistics> summarizingDouble(ToDoubleFunction<? super T> mapper) {
    return new CollectorImpl<T, DoubleSummaryStatistics, DoubleSummaryStatistics>(
            DoubleSummaryStatistics::new,
            (r, t) -> r.accept(mapper.applyAsDouble(t)),
            (l, r) -> { l.combine(r); return l; }, CH_ID);
}

并在其自己的班级

public class DoubleSummaryStatistics implements DoubleConsumer {
    private long count;
    private double sum;
    private double sumCompensation; // Low order bits of sum
    private double simpleSum; // Used to compute right sum for non-finite inputs
    private double min = Double.POSITIVE_INFINITY;
    private double max = Double.NEGATIVE_INFINITY;

答案 3 :(得分:3)

您应该构建一个抽象,它是收集器的聚合器,而不是链接收集器:使用接受收集器列表的类实现Collector接口,并将每个方法调用委托给每个收集器。然后,最后,返回new Data(),其中包含嵌套收集器生成的所有结果。

您可以通过使用Collector.of(supplier, accumulator, combiner, finisher, Collector.Characteristics... characteristics)来避免使用所有方法声明创建自定义类。finisher lambda将调用每个嵌套收集器的终结符,然后返回Data实例。

答案 4 :(得分:0)

在 Java12 中,收集器 API 已使用静态 teeing(...) 函数进行扩展:

<块引用>

teeing (Collector下游1, 收藏家 下游 2, 双函数 合并)

这提供了一种内置功能,可以在一个 Stream 上使用两个收集器并将结果合并到一个对象中。

下面是一个小示例,其中将员工列表分成年龄组,对于每个组,对年龄和薪水执行的两个 Collectors.summarizingInt() 作为 IntSummaryStatistics 列表返回:

import java.util.*;
import java.util.function.Function;
import java.util.stream.Collectors;

public class CollectorTeeingTest {

public static void main(String... args){

    NavigableSet<Integer> age_groups = new TreeSet<>();
    age_groups.addAll(List.of(30,40,50,60,Integer.MAX_VALUE)); //we don't want to map to null

    Function<Integer,Integer> to_age_groups = age -> age_groups.higher(age);

    List<Employee> employees = List.of( new Employee("A",21,2000),
                                        new Employee("B",24,2400),
                                        new Employee("C",32,3000),
                                        new Employee("D",40,4000),
                                        new Employee("E",41,4100),
                                        new Employee("F",61,6100)
    );

    Map<Integer,List<IntSummaryStatistics>> stats = employees.stream()
            .collect(Collectors.groupingBy(
                employee -> to_age_groups.apply(employee.getAge()),
                Collectors.teeing(
                    Collectors.summarizingInt(Employee::getAge),
                    Collectors.summarizingInt(Employee::getSalary),
                    (stat1, stat2) -> List.of(stat1,stat2))));

    stats.entrySet().stream().forEach(entry -> {
        System.out.println("Age-group: <"+entry.getKey()+"\n"+entry.getValue());
    });
}

public static class Employee{

    private final String name;
    private final int age;
    private final int salary;

    public Employee(String name, int age, int salary){
        
        this.name = name;
        this.age = age;
        this.salary = salary;
    }
    public String getName(){return this.name;}
    public int getAge(){return this.age;}
    public int getSalary(){return this.salary;}
}

}

输出:

Age-group: <2147483647
[IntSummaryStatistics{count=1, sum=61, min=61, average=61,000000, max=61}, IntSummaryStatistics{count=1, sum=6100, min=6100, average=6100,000000, max=6100}]
Age-group: <50
[IntSummaryStatistics{count=2, sum=81, min=40, average=40,500000, max=41}, IntSummaryStatistics{count=2, sum=8100, min=4000, average=4050,000000, max=4100}]
Age-group: <40
[IntSummaryStatistics{count=1, sum=32, min=32, average=32,000000, max=32}, IntSummaryStatistics{count=1, sum=3000, min=3000, average=3000,000000, max=3000}]
Age-group: <30
[IntSummaryStatistics{count=2, sum=45, min=21, average=22,500000, max=24}, IntSummaryStatistics{count=2, sum=4400, min=2000, average=2200,000000, max=2400}]