Question

我试图在过滤器中对字段进行排序。

输入文件/样本记录：

DocumentList: [
    Document{
        {
            _id=5975ff00a213745b5e1a8ed9,
            u_id=,
            mailboxcontent_id=5975ff00a213745b5e1a8ed8,                
            idmapping=Document{
                {ptype=PDF, cid=00988, normalizedcid=00988, systeminstanceid=, sourceschemaname=, pid=0244810006}
            },
            batchid=null,
            pdate=Tue Jul 11 17:52:25 IST 2017, locale=en_US
        }
    },
    Document{
        {
            _id=597608aba213742554f537a6,
            u_id=,
            mailboxcontent_id=597608aba213742554f537a3, 
            idmapping=Document{
                {platformtype=PDF, cid=00999, normalizedcid=00999, systeminstanceid=, sourceschemaname=, pid=0244810006}
            },
            batchid=null,
            pdate=Fri Jul 28 01:26:22 IST 2017,
            locale=en_US
        }
    }
]

在这里，我需要根据pdate进行排序。

List<Document> outList = documentList.stream()
    .filter(p -> p.getInteger(CommonConstants.VISIBILITY) == 1)
    .parallel()
    .sequential()
    .collect(Collectors.toCollection(ArrayList::new))
    .sort()
    .skip(skipValue)
    .limit(limtValue);

不确定如何排序

"order by pdate DESC"

提前谢谢！

Answer 1

您可以使用.sorted() Stream API方法：

.sorted(Comparator.comparing(Document::getPDate).reversed())

完整的，重构的例子：

List<Document> outList = documentList.stream()
  .filter(p -> p.getInteger(CommonConstants.VISIBILITY) == 1)
  .sorted(Comparator.comparing(Document::getPDate).reversed())
  .skip(skipValue).limit(limtValue)
  .collect(Collectors.toCollection(ArrayList::new))

很少有事情需要记住：

如果您不关心List实施，请使用 Collectors.toList()
collect()是一个终端操作，应该作为最后一个操作
.parallel().sequential()这完全没用 - 如果你想要并行化，坚持.parallel()如果没有，不要写任何东西，默认情况下都是顺序的
整个Stream将被加载到内存中以便排序

Answer 2

pivovarit's answer的替代方法，如果您的数据集可能太大而无法一次保留在内存中，这可能很有用（排序Stream必须在中间容器中维护整个基础数据集以提供能力正确排序。）

我们不会在这里使用流排序操作：相反，我们将使用数据结构来保存我们告诉它的集合中的多个元素，并将根据排序标准推出额外的元素（我不声称提供这里最好的实现，只是它的想法。）

为实现这一目标，我们需要定制收集器：

class SortedPileCollector<E> implements Collector<E, SortedSet<E>, List<E>> {
  int maxSize;
  Comparator<E> comptr;

  public SortedPileCollector(int maxSize, Comparator<E> comparator) {
    if (maxSize < 1) {
      throw new IllegalArgumentException("Max size cannot be " + maxSize);
    }
    this.maxSize = maxSize;
    comptr = Objects.requireNonNull(comparator);
  }

  public Supplier<SortedSet<E>> supplier() {
    return () -> new TreeSet<>(comptr);
  }

  public BiConsumer<SortedSet<E>, E> accumulator() {
    return this::accumulate; // see below
  }

  public BinaryOperator<SortedSet<E>> combiner() {
    return this::combine;
  }

  public Function<SortedSet<E>, List<E>> finisher() {
    return set -> new ArrayList<>(set);
  }

  public Set<Characteristics> characteristics() {
    return EnumSet.of(Characteristics.UNORDERED);
  }

  // The interesting part
  public void accumulate(SortedSet<E> set, E el) {
    Objects.requireNonNull(el);
    Objects.requireNonNull(set);
    if (set.size() < maxSize) {
      set.add(el);
    }
    else {
      if (set.contains(el)) {
        return; // we already have this element
      }
      E tailEl = set.last();
      Comparator<E> c = set.comparator();
      if (c.compare(tailEl, el) <= 0) {
        // If we did not have capacity, received element would've gone to the end of our set.
        // However, since we are at capacity, we will skip the element
        return;
      }
      else {
        // We received element that we should preserve.
        // Remove set tail and add our new element.
        set.remove(tailEl);
        set.add(el);
      }
    }
  }

  public SortedSet<E> combine(SortedSet<E> first, SortedSet<E> second) {
    SortedSet<E> result = new TreeSet<>(first);
    second.forEach(el -> accumulate(result, el)); // inefficient, but hopefully you see the general idea.
    return result;
  }
}

上述收集器充当管理有序数据集的可变结构。请注意，＆＃34;重复＆＃34;此实现忽略了元素 - 如果要允许重复，则需要更改实现。

在您的情况下使用此比较器，假设您需要三个顶级元素：

Comparator<Document> comparator = Comparator.comparing(Document::getPDate).reversed(); // see pivovarit's answer
List<Document> = documentList.stream()
  .filter(p -> p.getInteger(VISIBILITY) == 1)
  .collect(new SortedPileCollector<>(3, comparator));

Answer 3

获得结果列表后，假设Document.getPDate()返回pDate

，请执行此操作

Collections.sort(outList, Comparator.comparing(Document::getPDate).reversed());

Java 8流过滤器 - 基于排序的pdate

3 个答案: