我有按文档ID和版本排序的版本化文档流。
例如Av1,Av2,Bv1,Cv1,Cv2
我必须将其转换为另一个Stream,其记录通过文档ID进行汇总。
A [v1,v2],B [v1],C [v1,V2]
可以不使用Collectors.groupBy()
来完成此操作吗?我不想使用groupBy()
,因为它将在分组之前将流中的所有项目加载到内存中。从理论上讲,不必将整个流加载到内存中,因为它是有序的。
答案 0 :(得分:1)
这是我想出的解决方案:
Stream<Document> stream = Stream.of(
new Document("A", "v1"),
new Document("A", "v2"),
new Document("B", "v1"),
new Document("C", "v1"),
new Document("C", "v2")
);
Iterator<Document> iterator = stream.iterator();
Stream<GroupedDocument> result = Stream.generate(new Supplier<GroupedDocument>() {
Document lastDoc = null;
@Override
public GroupedDocument get() {
try {
Document doc = Optional.ofNullable(lastDoc).orElseGet(iterator::next);
String id = doc.getId();
GroupedDocument gd = new GroupedDocument(doc.getId());
gd.getVersions().add(doc.getVersion());
if (!iterator.hasNext()) {
return null;
}
while (iterator.hasNext() && (doc = iterator.next()).getId().equals(id)) {
gd.getVersions().add(doc.getVersion());
}
lastDoc = doc;
return gd;
} catch (NoSuchElementException ex) {
return null;
}
}
});
这是Document
和GroupedDocument
类:
class Document {
private String id;
private String version;
public Document(String id, String version) {
this.id = id;
this.version = version;
}
public String getId() {
return id;
}
public String getVersion() {
return version;
}
}
class GroupedDocument {
private String id;
private List<String> versions;
public GroupedDocument(String id) {
this.id = id;
versions = new ArrayList<>();
}
public String getId() {
return id;
}
public List<String> getVersions() {
return versions;
}
@Override
public String toString() {
return "GroupedDocument{" +
"id='" + id + '\'' +
", versions=" + versions +
'}';
}
}
请注意,结果流是无限流。在所有组之后,将有无限数量的null
。您可以在Java 9中使用takeWhile
来获取所有不为null的元素,或查看此post。
答案 1 :(得分:0)
Map<String, Stream<String>>
会为您提供所需的帮助吗?
A-v1,v2
B-v1
C-v1,v2
String[] docs = { "Av1", "Av2", "Bv1", "Cv1", "Cv2"};
Map<String, Stream<String>> map = Stream.<String>of(docs).
map(s ->s.substring(0, 1)).distinct(). //leave only A B C
collect(Collectors.toMap( s1 -> s1, //A B C as keys
s1 ->Stream.<String>of(docs). //value is filtered stream of docs
filter(s2 -> s1.substring(0, 1).
equals(s2.substring(0, 1)) ).
map(s3 -> s3.substring(1, s3.length())) //trim A B C
));
答案 2 :(得分:0)
您可以在groupRuns
中使用StreamEx library:
class Document {
public String id;
public int version;
public Document(String id, int version) {
this.id = id;
this.version = version;
}
public String toString() {
return "Document{"+id+version+ "}";
}
}
public class MyClass {
private static List<Document> docs = asList(
new Document("A", 1),
new Document("A", 2),
new Document("B", 1),
new Document("C", 1),
new Document("C", 2)
);
public static void main(String args[]) {
StreamEx<List<Document>> groups = StreamEx.of(docs).groupRuns((l, r) -> l.id.equals(r.id));
for (List<Document> grp: groups.collect(toList())) {
out.println(grp);
}
}
}
输出:
[文档{A1},文档{A2}]
[Document {B1}]
[文档{C1},文档{C2}]
我无法验证这不会消耗整个流,但是我无法想象为什么需要给出groupRuns
的含义。