为什么我的字符串操作使用lambda表达式很慢?

时间:2018-01-30 10:37:47

标签: java string lambda java-8

方法将逗号分隔的单词作为String并以逗号分隔的单词返回String个单词,其中包含自然排序顺序的单词,不包含任何4个字母的单词,包含UPPER大小写中的所有单词没有重复。与第二种方法相比,第一种方法相当慢。你能帮我理解为什么以及如何改进我的方法?

方法1:

public String stringProcessing(String s){
      Stream<String> tokens = Arrays.stream(s.split(","));
      return tokens.filter(t -> t.length() != 4) .distinct()
                   .sorted() 
                   .collect(Collectors.joining(",")).toUpperCase();
}

方法2:

public String processing(String s) {
    String[] tokens = s.split(",");
    Set<String> resultSet = new TreeSet<>();
    for(String t:tokens){
        if(t.length() !=  4)
            resultSet.add(t.toUpperCase());
    }        
    StringBuilder result = new StringBuilder();
    resultSet.forEach(key -> {
        result.append(key).append(","); 
    });
    result.deleteCharAt(result.length()-1);
    return result.toString();
}

3 个答案:

答案 0 :(得分:10)

没有记录使用过的JRE版本,输入数据集和基准测试方法的性能比较不适合得出任何结论。

此外,您的变体之间存在根本差异。在使用distinct()时,第一个变体处理原始字符串,在将完整结果字符串转换为大写字母之前,可能保留比第二个变体多得多的元素,将所有元素连接到字符串。相反,您的第二个变体在添加到集合之前会转换单个元素,因此只会进一步处理具有不同大写字母表示的字符串。因此,第二个变体在加入时可能需要更少的内存并处理更少的元素。

因此,在完成不同的事情时,比较这些操作的性能没有多大意义。这两种变体之间的比较更好:

public String variant1(String s){
    Stream<String> tokens = Arrays.stream(s.split(","));
    return tokens.filter(t -> t.length() != 4)
                 .map(String::toUpperCase)
                 .sorted().distinct()
                 .collect(Collectors.joining(","));
}

public String variant2(String s) {
    String[] tokens = s.split(",");
    Set<String> resultSet = new TreeSet<>();
    for(String t:tokens){
        if(t.length() !=  4)
            resultSet.add(t.toUpperCase());
    }
    return String.join(",", resultSet);
}

请注意,我更改了sorted()distinct()的顺序;正如this answer中所述,在distinct()之后直接应用sorted()可以在 distinct 操作中利用流的排序特性。

您也可以考虑在流式传输之前不要创建包含所有子字符串的临时数组:

public String variant1(String s){
    return Pattern.compile(",").splitAsStream(s)
            .filter(t -> t.length() != 4)
            .map(String::toUpperCase)
            .sorted().distinct()
            .collect(Collectors.joining(","));
}

您还可以添加第三个变种

public String variant3(String s) {
    Set<String> resultSet = new TreeSet<>();
    int o = 0, p;
    for(p = s.indexOf(','); p>=0; p = s.indexOf(',', o=p+1)) {
        if(p-o == 4) continue;
        resultSet.add(s.substring(o, p).toUpperCase());
    }
    if(s.length()-o != 4) resultSet.add(s.substring(o).toUpperCase());
    return String.join(",", resultSet);
}

不会创建子字符串数组,甚至会跳过为不符合过滤条件的子字符串创建子字符串。这并不意味着建议在生产代码中达到如此低的水平,但总是可能存在更快的变体,因此无论您使用的变体是否最快,而且它是否运行合理并不重要速度可维持。

答案 1 :(得分:7)

我想这只是一些实际发布一些JMH测试的时间。我采用了Holger的方法并测试了它们:

@BenchmarkMode(value = { Mode.AverageTime })
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class StreamVsLoop {

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder().include(StreamVsLoop.class.getSimpleName())
                .build();
        new Runner(opt).run();
    }

    @Param(value = {
            "a, b, c",
            "a, bb, ccc, dddd, eeeee, ffffff, ggggggg, hhhhhhhh",
            "a, bb, ccc, dddd, eeeee, ffffff, ggggggg, hhhhhhhh, ooooooooo, tttttttttttttt, mmmmmmmmmmmmmmmmmm" })
    String s;

    @Benchmark
    @Fork(1)
    public String stream() {
        Stream<String> tokens = Arrays.stream(s.split(","));
        return tokens.filter(t -> t.length() != 4)
                .map(String::toUpperCase)
                .sorted().distinct()
                .collect(Collectors.joining(","));
    }

    @Benchmark
    @Fork(1)
    public String loop() {
        String[] tokens = s.split(",");
        Set<String> resultSet = new TreeSet<>();
        for (String t : tokens) {
            if (t.length() != 4) {
                resultSet.add(t.toUpperCase());
            }
        }
        return String.join(",", resultSet);
    }

    @Benchmark
    @Fork(1)
    public String sortedDistinct() {
        return Pattern.compile(",").splitAsStream(s)
                .filter(t -> t.length() != 4)
                .map(String::toUpperCase)
                .sorted()
                .distinct()
                .collect(Collectors.joining(","));
    }

    @Benchmark
    @Fork(1)
    public String distinctSorted() {
        return Pattern.compile(",").splitAsStream(s)
                .filter(t -> t.length() != 4)
                .map(String::toUpperCase)
                .distinct()
                .sorted()
                .collect(Collectors.joining(","));
    }
}

以下是结果:

 stream              3 args         574.042
 loop                3 args         393.364
 sortedDistinct      3 args         829.077
 distinctSorted      3 args         836.558

 stream              8 args         1144.488
 loop                8 args         1014.756
 sortedDistinct      8 args         1533.968
 distinctSorted      8 args         1745.055

 stream             11 args         1829.571
 loop               11 args         1514.138
 sortedDistinct     11 args         1940.256
 distinctSorted     11 args         2591.715

结果有点明显,流速度较慢,但​​不是那么多,可能性可能会获胜。此外,霍尔格是对的(但他很少,如果有的话,不是)

答案 2 :(得分:2)

我花了一点时间来构建一个我会相当满意的测试;实际判断我会得到的数字......

@BenchmarkMode(value = { Mode.AverageTime })
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class StreamVsLoop {

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder().include(StreamVsLoop.class.getSimpleName())
                .jvmArgs("-ea")
                .shouldFailOnError(true)
                .build();
        new Runner(opt).run();
    }

    @State(Scope.Thread)
    public static class StringInput {

        private String[] letters = { "q", "a", "z", "w", "s", "x", "e", "d", "c", "r", "f", "v", "t", "g", "b",
                "y", "h", "n", "u", "j", "m", "i", "k", "o", "l", "p" };

        public String s = "";

        @Param(value = { "1000", "10000", "100000" })
        int next;

        @TearDown(Level.Iteration)
        public void tearDown() {
            if (next == 1000) {
                long count = Arrays.stream(s.split(",")).filter(x -> x.length() == 5).count();
                assert count == 99;
            }

            if (next == 10000) {
                long count = Arrays.stream(s.split(",")).filter(x -> x.length() == 5).count();
                assert count == 999;
            }

            if (next == 100000) {
                long count = Arrays.stream(s.split(",")).filter(x -> x.length() == 5).count();
                assert count == 9999;
            }
            s = null;
        }

        /**
         * a very brute-force tentative to have 1/2 elements to be filtered and 1/2 not
         * highly inneficiant, but this is not part of the measurment, so who cares?
         */
        @Setup(Level.Iteration)
        public void setUp() {

            for (int i = 0; i < next; i++) {
                int index = ThreadLocalRandom.current().nextInt(0, letters.length);
                String letter = letters[index];
                if (next == 1000) {
                    if (i < 500 && i % 4 == 0) {
                        s = s + "," + letter;
                    } else if (i > 500 && i % 5 == 0) {
                        s = s + "," + letter;
                    } else {
                        s = s + letter;
                    }

                } else if (next == 10000) {
                    if (i < 5000 && i % 4 == 0) {
                        s = s + "," + letter;
                    } else if (i > 5000 && i % 5 == 0) {
                        s = s + "," + letter;
                    } else {
                        s = s + letter;
                    }
                } else if (next == 100000) {
                    if (i < 50000 && i % 4 == 0) {
                        s = s + "," + letter;
                    } else if (i > 50000 && i % 5 == 0) {
                        s = s + "," + letter;
                    } else {
                        s = s + letter;
                    }
                }
            }
        }
    }

    @Benchmark
    @Fork
    public String stream(StringInput si) {
        Stream<String> tokens = Arrays.stream(si.s.split(","));
        return tokens.filter(t -> t.length() != 4)
                .map(String::toUpperCase)
                .sorted().distinct()
                .collect(Collectors.joining(","));
    }

    @Benchmark
    @Fork(1)
    public String loop(StringInput si) {
        String[] tokens = si.s.split(",");
        Set<String> resultSet = new TreeSet<>();
        for (String t : tokens) {
            if (t.length() != 4) {
                resultSet.add(t.toUpperCase());
            }
        }
        return String.join(",", resultSet);
    }

    @Benchmark
    @Fork(1)
    public String sortedDistinct(StringInput si) {
        return Pattern.compile(",").splitAsStream(si.s)
                .filter(t -> t.length() != 4)
                .map(String::toUpperCase)
                .sorted()
                .distinct()
                .collect(Collectors.joining(","));
    }

    @Benchmark
    @Fork(1)
    public String distinctSorted(StringInput si) {
        return Pattern.compile(",").splitAsStream(si.s)
                .filter(t -> t.length() != 4)
                .map(String::toUpperCase)
                .distinct()
                .sorted()
                .collect(Collectors.joining(","));
    }

    @Benchmark
    @Fork(1)
    public String variant3(StringInput si) {
        String s = si.s;
        Set<String> resultSet = new TreeSet<>();
        int o = 0, p;
        for (p = s.indexOf(','); p >= 0; p = s.indexOf(',', o = p + 1)) {
            if (p - o == 4) {
                continue;
            }
            resultSet.add(s.substring(o, p).toUpperCase());
        }
        if (s.length() - o != 4) {
            resultSet.add(s.substring(o).toUpperCase());
        }
        return String.join(",", resultSet);
    }
}
streamvsLoop.StreamVsLoop.distinctSorted    1000   0.028
streamvsLoop.StreamVsLoop.sortedDistinct    1000   0.024
streamvsLoop.StreamVsLoop.loop              1000   0.016
streamvsLoop.StreamVsLoop.stream            1000   0.020 
streamvsLoop.StreamVsLoop.variant3          1000   0.012


streamvsLoop.StreamVsLoop.distinctSorted   10000   0.394
streamvsLoop.StreamVsLoop.sortedDistinct   10000   0.359
streamvsLoop.StreamVsLoop.loop             10000   0.274
streamvsLoop.StreamVsLoop.stream           10000   0.304  ± 0.006
streamvsLoop.StreamVsLoop.variant3         10000   0.234


streamvsLoop.StreamVsLoop.distinctSorted  100000   4.950
streamvsLoop.StreamVsLoop.sortedDistinct  100000   4.432
streamvsLoop.StreamVsLoop.loop            100000   5.457
streamvsLoop.StreamVsLoop.stream          100000   3.927 ± 0.048
streamvsLoop.StreamVsLoop.variant3        100000   3.595

Holger的方法获胜,但是一旦代码足够热,男孩在其他解决方案之间的差异很小。