我有一个小型演示应用程序,当使用需要代理对(即不能用2个字节表示)的Unicode代码点时,显示Java子字符串实现的问题。我想知道我的解决方案是否运作良好,或者是否缺少任何东西。我已经考虑过在codereview上发布内容,但这与Java的Strings实现有关,而不是与我的简单代码本身有关。
public class SubstringTest {
public static void main(String[] args) {
String stringWithPlus2ByteCodePoints = "";
String substring1 = stringWithPlus2ByteCodePoints.substring(0, 1);
String substring2 = stringWithPlus2ByteCodePoints.substring(0, 2);
String substring3 = stringWithPlus2ByteCodePoints.substring(1, 3);
System.out.println(stringWithPlus2ByteCodePoints);
System.out.println("invalid sub" + substring1);
System.out.println("invalid sub" + substring2);
System.out.println("invalid sub" + substring3);
String realSub1 = getRealSubstring(stringWithPlus2ByteCodePoints, 0, 1);
String realSub2 = getRealSubstring(stringWithPlus2ByteCodePoints, 0, 2);
String realSub3 = getRealSubstring(stringWithPlus2ByteCodePoints, 1, 3);
System.out.println("real sub:" + realSub1);
System.out.println("real sub:" + realSub2);
System.out.println("real sub:" + realSub3);
}
private static String getRealSubstring(String string, int beginIndex, int endIndex) {
if (string == null)
throw new IllegalArgumentException("String should not be null");
int length = string.length();
if (endIndex < 0 || beginIndex > endIndex || beginIndex > length || endIndex > length)
throw new IllegalArgumentException("Invalid indices");
int realBeginIndex = string.offsetByCodePoints(0, beginIndex);
int realEndIndex = string.offsetByCodePoints(0, endIndex);
return string.substring(realBeginIndex, realEndIndex);
}
}
输出:
invalid sub: ?
invalid sub:
invalid sub: ??
real sub:
real sub:
real sub:
我可以依靠我的子字符串实现来始终提供所需的子字符串,从而避免Java使用chars作为其子字符串方法的问题吗?
答案 0 :(得分:2)
无需两次走到beginIndex
:
public String codePointSubstring(String s, int start, int end) {
int a = s.offsetByCodePoints(0, start);
return s.substring(a, s.offsetByCodePoints(a, end - start));
}
从此Scala代码段翻译而成:
def codePointSubstring(s: String, begin: Int, end: Int): String = {
val a = s.offsetByCodePoints(0, begin)
s.substring(a, s.offsetByCodePoints(a, end - begin))
}
我省略了IllegalArgumentException
,因为它们似乎没有包含比反常会抛出的异常更多的信息。