Question

我想在文件中找到所有单词的集合。这个集合应该排序。大小写无关紧要。这是我的方法：

public static Set<String> setOfWords(String fileName) throws IOException {

    Set<String> wordSet;
    Stream<String> stream = java.nio.file.Files.lines(java.nio.file.Paths.get(fileName));

    wordSet = stream
                .map(line -> line.split("[ .,;?!.:()]"))
                .flatMap(Arrays::stream)
                .sorted()
                .map(String::toLowerCase)
                .collect(Collectors.toSet());
    stream.close();
    return wordSet;
}

测试文件：

这是一个文件五行。它有两句话，并包含word文件在这个文件的多行中。该文件可用于测试吗？

打印套装时，我得到以下输出：

Set of words: 
a
be
in
sentences
testing
this
for
multiple
is
it
used
two
the
can
with
contained
file
and
of
has
lines
five
word

有人可以告诉我，为什么这个集合没有按自然顺序排序（对于Strings lexiographic）？

提前致谢

Answer 1

您可以使用TreeSet作为String.CASE_INSENSITIVE_ORDER

使用排序集合，例如Comparator

Set<String> set = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .collect(Collectors.toCollection(()-> new TreeSet<>(String.CASE_INSENSITIVE_ORDER)));

或者您可以使用不区分大小写的比较器对元素进行排序，并将其收集到维护插入顺序的集合中。

List<String> list = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .sorted(String::compareToIgnoreCase)
            .distinct()
            .collect(Collectors.toList());

Answer 2

由于排序区分大小写，因此您应该在排序之前映射到小写。

除此之外，您应该将输出收集到有序集合中，例如List或某些SortedSet实现（但如果您使用SortedSet则不需要执行sorted()，因为无论如何都会对Set进行排序。

List输出：

List<String> wordSet = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .map(String::toLowerCase)
            .sorted()
            .collect(Collectors.toList());

编辑：

正如Hank评论的那样，如果你想消除输出Collection中的重复项，List不会做，那么你必须将这些元素收集到{ {1}}实施。

SortedSet输出：

SortedSet

字符串流没有排序？

2 个答案: