我需要创建一个服务来解析分页网站的信息并返回解析信息的迭代器。
为此,我经常使用流来解析链接。但是,我注意到如果在具有flatmap调用的java流上调用iterator(),则在返回第一次迭代之前会完全读取每个平面映射的流。如果其中一个流需要很长时间才能完成,或者是无限的,那么最终迭代器将永远不会返回迭代。
这是设计的吗?我应该采取不同的做法吗?看看下面的示例代码。注意使用foreach()和iterator()时输出如何变化。
package temp;
import java.util.Arrays;
import java.util.Iterator;
import java.util.concurrent.ThreadLocalRandom;
import java.util.function.Supplier;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
import com.google.common.collect.AbstractIterator;
public class StreamTest {
public static void main(String[] args) {
// set first iterable max index randomly
int MAX_INDEX = ThreadLocalRandom.current().nextInt(10, 20 + 1);
System.out.println("max index: " + MAX_INDEX);
// create slow iterable
Iterable<String> iterable1 = () -> new AbstractIterator<String>() {
private int index = -1;
@Override
protected String computeNext() {
index++;
if (index >= MAX_INDEX) {
return this.endOfData();
}
System.out.println("dummy computing index: " + index);
try {
Thread.sleep(500);
} catch (InterruptedException e) {
throw new java.lang.RuntimeException(e);
}
return "iterable " + index;
}
};
// create list
Iterable<String> iterable2 = Arrays.asList("list index 1", "list index 2", "list index 3");
// create a stream supplier
Supplier<Stream<String>> streamSupplier = () -> Arrays.asList(iterable1, iterable2).stream()
.flatMap(i -> StreamSupport.stream(i.spliterator(), false));
// print using for each
System.out.println("\n***testing for each***");
streamSupplier.get().forEach(str -> {
System.out.println("for each - " + str);
});
System.out.println("\n***testing iterator***");
Iterator<String> iter = streamSupplier.get().iterator();
while (iter.hasNext()) {
System.out.println("iterator - " + iter.next());
}
}
}
以下是上述输出:
max index: 12
***testing for each***
dummy computing index: 0
for each - iterable 0
dummy computing index: 1
for each - iterable 1
dummy computing index: 2
for each - iterable 2
dummy computing index: 3
for each - iterable 3
dummy computing index: 4
for each - iterable 4
dummy computing index: 5
for each - iterable 5
dummy computing index: 6
for each - iterable 6
dummy computing index: 7
for each - iterable 7
dummy computing index: 8
for each - iterable 8
dummy computing index: 9
for each - iterable 9
dummy computing index: 10
for each - iterable 10
dummy computing index: 11
for each - iterable 11
for each - list index 1
for each - list index 2
for each - list index 3
***testing iterator***
dummy computing index: 0
dummy computing index: 1
dummy computing index: 2
dummy computing index: 3
dummy computing index: 4
dummy computing index: 5
dummy computing index: 6
dummy computing index: 7
dummy computing index: 8
dummy computing index: 9
dummy computing index: 10
dummy computing index: 11
iterator - iterable 0
iterator - iterable 1
iterator - iterable 2
iterator - iterable 3
iterator - iterable 4
iterator - iterable 5
iterator - iterable 6
iterator - iterable 7
iterator - iterable 8
iterator - iterable 9
iterator - iterable 10
iterator - iterable 11
iterator - list index 1
iterator - list index 2
iterator - list index 3
iterator()和foreach()不应该有相同的输出吗?