方法将逗号分隔的单词作为String
并以逗号分隔的单词返回String
个单词,其中包含自然排序顺序的单词,不包含任何4个字母的单词,包含UPPER大小写中的所有单词没有重复。与第二种方法相比,第一种方法相当慢。你能帮我理解为什么以及如何改进我的方法?
方法1:
public String stringProcessing(String s){
Stream<String> tokens = Arrays.stream(s.split(","));
return tokens.filter(t -> t.length() != 4) .distinct()
.sorted()
.collect(Collectors.joining(",")).toUpperCase();
}
方法2:
public String processing(String s) {
String[] tokens = s.split(",");
Set<String> resultSet = new TreeSet<>();
for(String t:tokens){
if(t.length() != 4)
resultSet.add(t.toUpperCase());
}
StringBuilder result = new StringBuilder();
resultSet.forEach(key -> {
result.append(key).append(",");
});
result.deleteCharAt(result.length()-1);
return result.toString();
}
答案 0 :(得分:10)
没有记录使用过的JRE版本,输入数据集和基准测试方法的性能比较不适合得出任何结论。
此外,您的变体之间存在根本差异。在使用distinct()
时,第一个变体处理原始字符串,在将完整结果字符串转换为大写字母之前,可能保留比第二个变体多得多的元素,将所有元素连接到字符串。相反,您的第二个变体在添加到集合之前会转换单个元素,因此只会进一步处理具有不同大写字母表示的字符串。因此,第二个变体在加入时可能需要更少的内存并处理更少的元素。
因此,在完成不同的事情时,比较这些操作的性能没有多大意义。这两种变体之间的比较更好:
public String variant1(String s){
Stream<String> tokens = Arrays.stream(s.split(","));
return tokens.filter(t -> t.length() != 4)
.map(String::toUpperCase)
.sorted().distinct()
.collect(Collectors.joining(","));
}
public String variant2(String s) {
String[] tokens = s.split(",");
Set<String> resultSet = new TreeSet<>();
for(String t:tokens){
if(t.length() != 4)
resultSet.add(t.toUpperCase());
}
return String.join(",", resultSet);
}
请注意,我更改了sorted()
和distinct()
的顺序;正如this answer中所述,在distinct()
之后直接应用sorted()
可以在 distinct 操作中利用流的排序特性。
您也可以考虑在流式传输之前不要创建包含所有子字符串的临时数组:
public String variant1(String s){
return Pattern.compile(",").splitAsStream(s)
.filter(t -> t.length() != 4)
.map(String::toUpperCase)
.sorted().distinct()
.collect(Collectors.joining(","));
}
您还可以添加第三个变种
public String variant3(String s) {
Set<String> resultSet = new TreeSet<>();
int o = 0, p;
for(p = s.indexOf(','); p>=0; p = s.indexOf(',', o=p+1)) {
if(p-o == 4) continue;
resultSet.add(s.substring(o, p).toUpperCase());
}
if(s.length()-o != 4) resultSet.add(s.substring(o).toUpperCase());
return String.join(",", resultSet);
}
不会创建子字符串数组,甚至会跳过为不符合过滤条件的子字符串创建子字符串。这并不意味着建议在生产代码中达到如此低的水平,但总是可能存在更快的变体,因此无论您使用的变体是否最快,而且它是否运行合理并不重要速度可维持。
答案 1 :(得分:7)
我想这只是一些实际发布一些JMH测试的时间。我采用了Holger的方法并测试了它们:
@BenchmarkMode(value = { Mode.AverageTime })
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class StreamVsLoop {
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(StreamVsLoop.class.getSimpleName())
.build();
new Runner(opt).run();
}
@Param(value = {
"a, b, c",
"a, bb, ccc, dddd, eeeee, ffffff, ggggggg, hhhhhhhh",
"a, bb, ccc, dddd, eeeee, ffffff, ggggggg, hhhhhhhh, ooooooooo, tttttttttttttt, mmmmmmmmmmmmmmmmmm" })
String s;
@Benchmark
@Fork(1)
public String stream() {
Stream<String> tokens = Arrays.stream(s.split(","));
return tokens.filter(t -> t.length() != 4)
.map(String::toUpperCase)
.sorted().distinct()
.collect(Collectors.joining(","));
}
@Benchmark
@Fork(1)
public String loop() {
String[] tokens = s.split(",");
Set<String> resultSet = new TreeSet<>();
for (String t : tokens) {
if (t.length() != 4) {
resultSet.add(t.toUpperCase());
}
}
return String.join(",", resultSet);
}
@Benchmark
@Fork(1)
public String sortedDistinct() {
return Pattern.compile(",").splitAsStream(s)
.filter(t -> t.length() != 4)
.map(String::toUpperCase)
.sorted()
.distinct()
.collect(Collectors.joining(","));
}
@Benchmark
@Fork(1)
public String distinctSorted() {
return Pattern.compile(",").splitAsStream(s)
.filter(t -> t.length() != 4)
.map(String::toUpperCase)
.distinct()
.sorted()
.collect(Collectors.joining(","));
}
}
以下是结果:
stream 3 args 574.042
loop 3 args 393.364
sortedDistinct 3 args 829.077
distinctSorted 3 args 836.558
stream 8 args 1144.488
loop 8 args 1014.756
sortedDistinct 8 args 1533.968
distinctSorted 8 args 1745.055
stream 11 args 1829.571
loop 11 args 1514.138
sortedDistinct 11 args 1940.256
distinctSorted 11 args 2591.715
结果有点明显,流速度较慢,但不是那么多,可能性可能会获胜。此外,霍尔格是对的(但他很少,如果有的话,不是)
答案 2 :(得分:2)
我花了一点时间来构建一个我会相当满意的测试;实际判断我会得到的数字......
@BenchmarkMode(value = { Mode.AverageTime })
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class StreamVsLoop {
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(StreamVsLoop.class.getSimpleName())
.jvmArgs("-ea")
.shouldFailOnError(true)
.build();
new Runner(opt).run();
}
@State(Scope.Thread)
public static class StringInput {
private String[] letters = { "q", "a", "z", "w", "s", "x", "e", "d", "c", "r", "f", "v", "t", "g", "b",
"y", "h", "n", "u", "j", "m", "i", "k", "o", "l", "p" };
public String s = "";
@Param(value = { "1000", "10000", "100000" })
int next;
@TearDown(Level.Iteration)
public void tearDown() {
if (next == 1000) {
long count = Arrays.stream(s.split(",")).filter(x -> x.length() == 5).count();
assert count == 99;
}
if (next == 10000) {
long count = Arrays.stream(s.split(",")).filter(x -> x.length() == 5).count();
assert count == 999;
}
if (next == 100000) {
long count = Arrays.stream(s.split(",")).filter(x -> x.length() == 5).count();
assert count == 9999;
}
s = null;
}
/**
* a very brute-force tentative to have 1/2 elements to be filtered and 1/2 not
* highly inneficiant, but this is not part of the measurment, so who cares?
*/
@Setup(Level.Iteration)
public void setUp() {
for (int i = 0; i < next; i++) {
int index = ThreadLocalRandom.current().nextInt(0, letters.length);
String letter = letters[index];
if (next == 1000) {
if (i < 500 && i % 4 == 0) {
s = s + "," + letter;
} else if (i > 500 && i % 5 == 0) {
s = s + "," + letter;
} else {
s = s + letter;
}
} else if (next == 10000) {
if (i < 5000 && i % 4 == 0) {
s = s + "," + letter;
} else if (i > 5000 && i % 5 == 0) {
s = s + "," + letter;
} else {
s = s + letter;
}
} else if (next == 100000) {
if (i < 50000 && i % 4 == 0) {
s = s + "," + letter;
} else if (i > 50000 && i % 5 == 0) {
s = s + "," + letter;
} else {
s = s + letter;
}
}
}
}
}
@Benchmark
@Fork
public String stream(StringInput si) {
Stream<String> tokens = Arrays.stream(si.s.split(","));
return tokens.filter(t -> t.length() != 4)
.map(String::toUpperCase)
.sorted().distinct()
.collect(Collectors.joining(","));
}
@Benchmark
@Fork(1)
public String loop(StringInput si) {
String[] tokens = si.s.split(",");
Set<String> resultSet = new TreeSet<>();
for (String t : tokens) {
if (t.length() != 4) {
resultSet.add(t.toUpperCase());
}
}
return String.join(",", resultSet);
}
@Benchmark
@Fork(1)
public String sortedDistinct(StringInput si) {
return Pattern.compile(",").splitAsStream(si.s)
.filter(t -> t.length() != 4)
.map(String::toUpperCase)
.sorted()
.distinct()
.collect(Collectors.joining(","));
}
@Benchmark
@Fork(1)
public String distinctSorted(StringInput si) {
return Pattern.compile(",").splitAsStream(si.s)
.filter(t -> t.length() != 4)
.map(String::toUpperCase)
.distinct()
.sorted()
.collect(Collectors.joining(","));
}
@Benchmark
@Fork(1)
public String variant3(StringInput si) {
String s = si.s;
Set<String> resultSet = new TreeSet<>();
int o = 0, p;
for (p = s.indexOf(','); p >= 0; p = s.indexOf(',', o = p + 1)) {
if (p - o == 4) {
continue;
}
resultSet.add(s.substring(o, p).toUpperCase());
}
if (s.length() - o != 4) {
resultSet.add(s.substring(o).toUpperCase());
}
return String.join(",", resultSet);
}
}
streamvsLoop.StreamVsLoop.distinctSorted 1000 0.028
streamvsLoop.StreamVsLoop.sortedDistinct 1000 0.024
streamvsLoop.StreamVsLoop.loop 1000 0.016
streamvsLoop.StreamVsLoop.stream 1000 0.020
streamvsLoop.StreamVsLoop.variant3 1000 0.012
streamvsLoop.StreamVsLoop.distinctSorted 10000 0.394
streamvsLoop.StreamVsLoop.sortedDistinct 10000 0.359
streamvsLoop.StreamVsLoop.loop 10000 0.274
streamvsLoop.StreamVsLoop.stream 10000 0.304 ± 0.006
streamvsLoop.StreamVsLoop.variant3 10000 0.234
streamvsLoop.StreamVsLoop.distinctSorted 100000 4.950
streamvsLoop.StreamVsLoop.sortedDistinct 100000 4.432
streamvsLoop.StreamVsLoop.loop 100000 5.457
streamvsLoop.StreamVsLoop.stream 100000 3.927 ± 0.048
streamvsLoop.StreamVsLoop.variant3 100000 3.595
Holger的方法获胜,但是一旦代码足够热,男孩在其他解决方案之间的差异很小。