如何将Java流转换为滑动窗口?

时间:2015-12-08 14:38:44

标签: java java-8 java-stream

将流转换为滑动窗口的推荐方法是什么?

例如,在Ruby中,您可以使用each_cons

DECLARE
    TYPE type_cd IS VARRAY(12) OF CHAR(2);
    v_my_list  type_cd ;
    v_cd    CHAR(2) := 'AA';
    v_found BOOLEAN := false;
    v_index INTEGER;
BEGIN
    v_my_list   := type_cd (v_cd);

    v_index := v_my_list.FIRST;
    WHILE NOT v_found AND v_index IS NOT NULL LOOP
      IF v_my_list(v_index) = v_cd THEN
        v_found := true;
      ELSE
        v_index := v_my_list.NEXT( v_index );
      END IF;
    END LOOP;
    IF v_found THEN
        DBMS_OUTPUT.PUT_LINE(v_cd || ' is a member of v_my_list at ' || v_index );
    ELSE
        DBMS_OUTPUT.PUT_LINE(v_cd || ' is NOT a member of v_my_list');
    END IF;
END;

在番石榴中,我发现只有Iterators#partition,这是相关但没有滑动窗口:

irb(main):020:0> [1,2,3,4].each_cons(2) { |x| puts x.inspect }
[1, 2]
[2, 3]
[3, 4]
=> nil
irb(main):021:0> [1,2,3,4].each_cons(3) { |x| puts x.inspect }
[1, 2, 3]
[2, 3, 4]
=> nil

6 个答案:

答案 0 :(得分:15)

API中没有这样的功能,因为它支持顺序和并行处理,并且很难为任意流源提供有效的滑动窗口函数并行处理(即使是高效的并行处理也非常困难,我实现了它,所以我知道。)

但是,如果您的来源是具有快速随机访问权限的List,则可以使用subList()方法获取所需的行为,如下所示:

public static <T> Stream<List<T>> sliding(List<T> list, int size) {
    if(size > list.size()) 
        return Stream.empty();
    return IntStream.range(0, list.size()-size+1)
                    .mapToObj(start -> list.subList(start, start+size));
}

我的StreamEx库实际上提供了类似方法:请参阅StreamEx.ofSubLists()

还有一些其他第三方解决方案不关心并行处理,并使用一些内部缓冲区提供滑动功能。例如,质子包StreamUtils.windowed

答案 1 :(得分:11)

如果您愿意使用第三方库而不需要并行性,那么jOOλ提供SQL样式的窗口函数,如下所示

int n = 2;

System.out.println(
Seq.of(1, 2, 3, 4)
   .window(0, n - 1)
   .filter(w -> w.count() == n)
   .map(w -> w.window().toList())
   .toList()
);

产生

[[1, 2], [2, 3], [3, 4]]

int n = 3;

System.out.println(
Seq.of(1, 2, 3, 4)
   .window(0, n - 1)
   .filter(w -> w.count() == n)
   .map(w -> w.window().toList())
   .toList()
);

产生

[[1, 2, 3], [2, 3, 4]]

Here's a blog post about how this works

免责声明:我为jOOλ背后的公司工作

答案 2 :(得分:6)

另一个选项cyclops-react建立在jOOλ的Seq接口(和JDK 8 Stream)之上,但是simple-react建立了并发/并行性(如果这对你很重要 - 通过创建Streams of Futures)。 / p>

你可以使用Lukas强大的窗口函数和任何一个库(因为我们扩展了很棒的jOOλ),但是还有一个滑动算子,我认为在这种情况下简化了事情,适合在无限流中使用(即它没有不消耗流,但在值流过时缓冲值。

使用ReactiveSeq,它看起来像这样 -

ReactiveSeq.of(1, 2, 3, 4)
           .sliding(2)
           .forEach(System.out::println);

LazyFutureStream看起来像下面的示例 -

 LazyFutureStream.iterate(1,i->i+1)
                 .sliding(3,2) //lists of 3, increment 2
                 .forEach(System.out::println);

在cyclops-streams StreamUtils类中还提供了用于在java.util.stream.Stream上创建滑动视图的Equivalant静态方法。

       StreamUtils.sliding(Stream.of(1,2,3,4),2)
                  .map(Pair::new);

如果您想直接使用每个滑动视图,可以使用返回List Transformer的slidingT运算符。例如,要为每个滑动视图中的每个元素添加一个数字,然后将每个滑动窗口减少为我们可以执行的元素之和: -

        ReactiveSeq<Integer> windowsSummed = ReactiveSeq.fromIterable(data)
                                                        .slidingT(3)
                                                        .map(a->a+toAdd)
                                                        .reduce(0,(a,b)->a+b)
                                                        .stream();

免责声明:我为独眼企业反应后的公司工作

答案 3 :(得分:2)

如果您想将Scala的持久集合的全部功能带到Java,您可以使用库Javaslang

// this imports List, Stream, Iterator, ...
import javaslang.collection.*;

Iterator.range(1, 5).sliding(3)
        .forEach(System.out::println);
// --->
// List(1, 2, 3)
// List(2, 3, 4)

Iterator.range(1, 5).sliding(2, 3)
        .forEach(System.out::println);
// --->
// List(1, 2)
// List(4)

Iterator.ofAll(javaStream).sliding(3);

您可能不仅使用Iterator,这也适用于几乎任何其他Javaslang集合:Array,Vector,List,Stream,Queue,HashSet,LinkedHashSet,TreeSet,...

enter image description here

(Javaslang 2.1.0-alpha概述)

免责声明:我是Javaslang的创作者

答案 4 :(得分:0)

我在Tomek的Nurkiewicz博客(https://www.nurkiewicz.com/2014/07/grouping-sampling-and-batching-custom.html)上找到了解决方案。您可以在SlidingCollector下使用

public class SlidingCollector<T> implements Collector<T, List<List<T>>, List<List<T>>> {

    private final int size;
    private final int step;
    private final int window;
    private final Queue<T> buffer = new ArrayDeque<>();
    private int totalIn = 0;

    public SlidingCollector(int size, int step) {
        this.size = size;
        this.step = step;
        this.window = max(size, step);
    }

    @Override
    public Supplier<List<List<T>>> supplier() {
        return ArrayList::new;
    }

    @Override
    public BiConsumer<List<List<T>>, T> accumulator() {
        return (lists, t) -> {
            buffer.offer(t);
            ++totalIn;
            if (buffer.size() == window) {
                dumpCurrent(lists);
                shiftBy(step);
            }
        };
    }

    @Override
    public Function<List<List<T>>, List<List<T>>> finisher() {
        return lists -> {
            if (!buffer.isEmpty()) {
                final int totalOut = estimateTotalOut();
                if (totalOut > lists.size()) {
                    dumpCurrent(lists);
                }
            }
            return lists;
        };
    }

    private int estimateTotalOut() {
        return max(0, (totalIn + step - size - 1) / step) + 1;
    }

    private void dumpCurrent(List<List<T>> lists) {
        final List<T> batch = buffer.stream().limit(size).collect(toList());
        lists.add(batch);
    }

    private void shiftBy(int by) {
        for (int i = 0; i < by; i++) {
            buffer.remove();
        }
    }

    @Override
    public BinaryOperator<List<List<T>>> combiner() {
        return (l1, l2) -> {
            throw new UnsupportedOperationException("Combining not possible");
        };
    }

    @Override
    public Set<Characteristics> characteristics() {
        return EnumSet.noneOf(Characteristics.class);
    }

}

下面是Tomekin Spock的一些示例(我希望它是可读的):

import static com.nurkiewicz.CustomCollectors.sliding

@Unroll
class CustomCollectorsSpec extends Specification {

    def "Sliding window of #input with size #size and step of 1 is #output"() {
        expect:
        input.stream().collect(sliding(size)) == output

        where:
        input  | size | output
        []     | 5    | []
        [1]    | 1    | [[1]]
        [1, 2] | 1    | [[1], [2]]
        [1, 2] | 2    | [[1, 2]]
        [1, 2] | 3    | [[1, 2]]
        1..3   | 3    | [[1, 2, 3]]
        1..4   | 2    | [[1, 2], [2, 3], [3, 4]]
        1..4   | 3    | [[1, 2, 3], [2, 3, 4]]
        1..7   | 3    | [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7]]
        1..7   | 6    | [1..6, 2..7]
    }

    def "Sliding window of #input with size #size and no overlapping is #output"() {
        expect:
        input.stream().collect(sliding(size, size)) == output

        where:
        input | size | output
        []    | 5    | []
        1..3  | 2    | [[1, 2], [3]]
        1..4  | 4    | [1..4]
        1..4  | 5    | [1..4]
        1..7  | 3    | [1..3, 4..6, [7]]
        1..6  | 2    | [[1, 2], [3, 4], [5, 6]]
    }

    def "Sliding window of #input with size #size and some overlapping is #output"() {
        expect:
        input.stream().collect(sliding(size, 2)) == output

        where:
        input | size | output
        []    | 5    | []
        1..4  | 5    | [[1, 2, 3, 4]]
        1..7  | 3    | [1..3, 3..5, 5..7]
        1..6  | 4    | [1..4, 3..6]
        1..9  | 4    | [1..4, 3..6, 5..8, 7..9]
        1..10 | 4    | [1..4, 3..6, 5..8, 7..10]
        1..11 | 4    | [1..4, 3..6, 5..8, 7..10, 9..11]
    }

    def "Sliding window of #input with size #size and gap of #gap is #output"() {
        expect:
        input.stream().collect(sliding(size, size + gap)) == output

        where:
        input | size | gap | output
        []    | 5    | 1   | []
        1..9  | 4    | 2   | [1..4, 7..9]
        1..10 | 4    | 2   | [1..4, 7..10]
        1..11 | 4    | 2   | [1..4, 7..10]
        1..12 | 4    | 2   | [1..4, 7..10]
        1..13 | 4    | 2   | [1..4, 7..10, [13]]
        1..13 | 5    | 1   | [1..5, 7..11, [13]]
        1..12 | 5    | 3   | [1..5, 9..12]
        1..13 | 5    | 3   | [1..5, 9..13]
    }

    def "Sampling #input taking every #nth th element is #output"() {
        expect:
        input.stream().collect(sliding(1, nth)) == output

        where:
        input  | nth | output
        []     | 1   | []
        []     | 5   | []
        1..3   | 5   | [[1]]
        1..6   | 2   | [[1], [3], [5]]
        1..10  | 5   | [[1], [6]]
        1..100 | 30  | [[1], [31], [61], [91]]
    }
}

答案 5 :(得分:0)

另一种选择是像执行here一样实现自定义拆分器:

import java.util.*;

public class SlidingWindowSpliterator<T> implements Spliterator<Stream<T>> {

    static <T> Stream<Stream<T>> windowed(Collection<T> stream, int windowSize) {
        return StreamSupport.stream(
          new SlidingWindowSpliterator<>(stream, windowSize), false);
    }

    private final Queue<T> buffer;
    private final Iterator<T> sourceIterator;
    private final int windowSize;
    private final int size;

    private SlidingWindowSpliterator(Collection<T> source, int windowSize) {
        this.buffer = new ArrayDeque<>(windowSize);
        this.sourceIterator = Objects.requireNonNull(source).iterator();
        this.windowSize = windowSize;
        this.size = calculateSize(source, windowSize);
    }

    @Override
    public boolean tryAdvance(Consumer<? super Stream<T>> action) {
        if (windowSize < 1) {
            return false;
        }

        while (sourceIterator.hasNext()) {
            buffer.add(sourceIterator.next());

            if (buffer.size() == windowSize) {
                action.accept(Arrays.stream((T[]) buffer.toArray(new Object[0])));
                buffer.poll();
                return sourceIterator.hasNext();
            }
        }

        return false;
    }

    @Override
    public Spliterator<Stream<T>> trySplit() {
        return null;
    }

    @Override
    public long estimateSize() {
       return size;
    }

    @Override
    public int characteristics() {
        return ORDERED | NONNULL | SIZED;
    }

    private static int calculateSize(Collection<?> source, int windowSize) {
        return source.size() < windowSize
          ? 0
          : source.size() - windowSize + 1;
    }
}