这是关于表现的问题。我可以使用以下代码将大写转换为小写,反之亦然:
从小写到大写:
// Uppercase letters.
class UpperCase {
public static void main(String args[]) {
char ch;
for(int i=0; i < 10; i++) {
ch = (char) ('a' + i);
System.out.print(ch);
// This statement turns off the 6th bit.
ch = (char) ((int) ch & 65503); // ch is now uppercase
System.out.print(ch + " ");
}
}
}
从大写到小写:
// Lowercase letters.
class LowerCase {
public static void main(String args[]) {
char ch;
for(int i=0; i < 10; i++) {
ch = (char) ('A' + i);
System.out.print(ch);
ch = (char) ((int) ch | 32); // ch is now lowercase
System.out.print(ch + " ");
}
}
}
我知道Java提供了以下方法:.toUpperCase( )
和.toLowerCase( )
。考虑性能,执行此转换的最快方法是什么,使用按位操作,就像我在上面的代码中显示的那样,或者使用.toUpperCase( )
和.toLowerCase( )
方法?谢谢。
编辑1:注意我如何使用十进制65503,即二进制1111111111011111。我使用的是16位,而不是8位。根据目前在How many bits in a character?获得更多选票的答案:
UTF-16编码中的Unicode字符介于16(2字节)和32位(4字节)之间,但大多数常用字符占16位。这是Windows内部使用的编码。
我的问题中的代码假设为UTF-16。
答案 0 :(得分:5)
Yes a method written by you will be slightly faster if you choose to perform the case conversion with a simple bitwise operation, whereas Java's methods have more complex logic to support unicode characters and not just the ASCII charset.
If you look at String.toLowerCase() you'll notice that there's a lot of logic in there, so if you were working with software that needed to process huge amounts of ASCII only, and nothing else, you might actually see some benefit from using a more direct approach.
But unless you are writing a program that spends most of its time converting ASCII, you won't be able to notice any difference even with a profiler (and if you are writing that kind of a program...you should look for another job).
答案 1 :(得分:3)
坚持使用提供的方法docker-compose exec solr solr create_core -c development
和.toLowerCase()
。添加两个单独的类来执行.toUpperCase()
已经提供的两种方法是一种矫枉过正的做法,会使您的程序变慢(略有差距)。
答案 2 :(得分:3)
您的代码仅适用于ANSII字符。那些没有明确转换小写和大写的语言,例如德语ß
(请纠正我,如果我错了,我的德语很糟糕)或使用多字节UTF-8代码点编写字母/符号时。如果你必须处理UTF-8,那么正确性在性能之前出现并且问题并不那么简单,如String.toLowerCase(Locale)
方法所示。
答案 3 :(得分:3)
正如所承诺的,这里有两个JMH基准;一个将Character#toUpperCase
与您的按位方法进行比较,另一个将Character#toLowerCase
与您的其他按位方法进行比较。请注意,只测试了英文字母中的字符。
第一个基准(大写):
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Fork(3)
public class Test {
@Param({"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"})
public char c;
@Benchmark
public char toUpperCaseNormal() {
return Character.toUpperCase(c);
}
@Benchmark
public char toUpperCaseBitwise() {
return (char) (c & 65503);
}
}
输出:
Benchmark (c) Mode Cnt Score Error Units
Test.toUpperCaseNormal a avgt 30 2.447 ± 0.028 ns/op
Test.toUpperCaseNormal b avgt 30 2.438 ± 0.035 ns/op
Test.toUpperCaseNormal c avgt 30 2.506 ± 0.083 ns/op
Test.toUpperCaseNormal d avgt 30 2.411 ± 0.010 ns/op
Test.toUpperCaseNormal e avgt 30 2.417 ± 0.010 ns/op
Test.toUpperCaseNormal f avgt 30 2.412 ± 0.005 ns/op
Test.toUpperCaseNormal g avgt 30 2.410 ± 0.004 ns/op
Test.toUpperCaseBitwise a avgt 30 1.758 ± 0.007 ns/op
Test.toUpperCaseBitwise b avgt 30 1.789 ± 0.032 ns/op
Test.toUpperCaseBitwise c avgt 30 1.763 ± 0.005 ns/op
Test.toUpperCaseBitwise d avgt 30 1.763 ± 0.012 ns/op
Test.toUpperCaseBitwise e avgt 30 1.757 ± 0.003 ns/op
Test.toUpperCaseBitwise f avgt 30 1.755 ± 0.003 ns/op
Test.toUpperCaseBitwise g avgt 30 1.759 ± 0.003 ns/op
第二个基准(小写):
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Fork(3)
public class Test {
@Param({"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"})
public char c;
@Benchmark
public char toLowerCaseNormal() {
return Character.toUpperCase(c);
}
@Benchmark
public char toLowerCaseBitwise() {
return (char) (c | 32);
}
}
输出:
Benchmark (c) Mode Cnt Score Error Units
Test.toLowerCaseNormal A avgt 30 2.084 ± 0.007 ns/op
Test.toLowerCaseNormal B avgt 30 2.079 ± 0.006 ns/op
Test.toLowerCaseNormal C avgt 30 2.081 ± 0.005 ns/op
Test.toLowerCaseNormal D avgt 30 2.083 ± 0.010 ns/op
Test.toLowerCaseNormal E avgt 30 2.080 ± 0.005 ns/op
Test.toLowerCaseNormal F avgt 30 2.091 ± 0.020 ns/op
Test.toLowerCaseNormal G avgt 30 2.116 ± 0.061 ns/op
Test.toLowerCaseBitwise A avgt 30 1.708 ± 0.006 ns/op
Test.toLowerCaseBitwise B avgt 30 1.705 ± 0.018 ns/op
Test.toLowerCaseBitwise C avgt 30 1.721 ± 0.022 ns/op
Test.toLowerCaseBitwise D avgt 30 1.718 ± 0.010 ns/op
Test.toLowerCaseBitwise E avgt 30 1.706 ± 0.009 ns/op
Test.toLowerCaseBitwise F avgt 30 1.704 ± 0.004 ns/op
Test.toLowerCaseBitwise G avgt 30 1.711 ± 0.007 ns/op
我只包含了几个不同的字母(即使所有字母都经过测试),因为它们都有相似的输出。
很明显,你的按位方法更快,主要是由于Character#toUpperCase
和Character#toLowerCase
执行逻辑检查(正如我今天在评论中提到的那样)。