我有一个需要修剪的StringBuilder对象(即所有空格字符/ u0020及以下从两端移除)。
我似乎无法在字符串构建器中找到可以执行此操作的方法。
这就是我现在正在做的事情:
String trimmedStr = strBuilder.toString().trim();
这给出了所需的输出,但它需要分配两个字符串而不是一个。当字符串仍然在StringBuilder中时,是否有更高效的修剪字符串?
答案 0 :(得分:24)
您不应该使用deleteCharAt方法。
正如Boris指出的那样,deleteCharAt方法每次都会复制数组。 Java 5中执行此操作的代码如下所示:
public AbstractStringBuilder deleteCharAt(int index) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
System.arraycopy(value, index+1, value, index, count-index-1);
count--;
return this;
}
当然,单凭推测还不足以选择一种优化方法而不是另一种方法,所以我决定在这个线程中计算3种方法:原始方法,删除方法和子字符串方法。
以下是我为orignal测试的代码:
public static String trimOriginal(StringBuilder sb) {
return sb.toString().trim();
}
删除方法:
public static String trimDelete(StringBuilder sb) {
while (sb.length() > 0 && Character.isWhitespace(sb.charAt(0))) {
sb.deleteCharAt(0);
}
while (sb.length() > 0 && Character.isWhitespace(sb.charAt(sb.length() - 1))) {
sb.deleteCharAt(sb.length() - 1);
}
return sb.toString();
}
子串方法:
public static String trimSubstring(StringBuilder sb) {
int first, last;
for (first=0; first<sb.length(); first++)
if (!Character.isWhitespace(sb.charAt(first)))
break;
for (last=sb.length(); last>first; last--)
if (!Character.isWhitespace(sb.charAt(last-1)))
break;
return sb.substring(first, last);
}
我执行了100次测试,每次都生成一个带有一万个字符串的StringBuffer,其中包含一万个尾随和前导空格。测试本身是非常基础的,但它可以很好地了解这些方法需要多长时间。
以下是对3种方法进行计时的代码:
public static void main(String[] args) {
long originalTime = 0;
long deleteTime = 0;
long substringTime = 0;
for (int i=0; i<100; i++) {
StringBuilder sb1 = new StringBuilder();
StringBuilder sb2 = new StringBuilder();
StringBuilder sb3 = new StringBuilder();
for (int j=0; j<10000; j++) {
sb1.append(" ");
sb2.append(" ");
sb3.append(" ");
}
for (int j=0; j<980000; j++) {
sb1.append("a");
sb2.append("a");
sb3.append("a");
}
for (int j=0; j<10000; j++) {
sb1.append(" ");
sb2.append(" ");
sb3.append(" ");
}
long timer1 = System.currentTimeMillis();
trimOriginal(sb1);
originalTime += System.currentTimeMillis() - timer1;
long timer2 = System.currentTimeMillis();
trimDelete(sb2);
deleteTime += System.currentTimeMillis() - timer2;
long timer3 = System.currentTimeMillis();
trimSubstring(sb3);
substringTime += System.currentTimeMillis() - timer3;
}
System.out.println("original: " + originalTime + " ms");
System.out.println("delete: " + deleteTime + " ms");
System.out.println("substring: " + substringTime + " ms");
}
我得到了以下输出:
original: 176 ms
delete: 179242 ms
substring: 154 ms
正如我们所看到的,子串方法提供了对原始“双字符串”方法的非常轻微的优化。但是,删除方法非常缓慢,应该避免使用。
所以回答你的问题:你可以按照你在问题中建议的方式修剪StringBuilder。子字符串方法提供的非常轻微的优化可能无法证明过多的代码。
答案 1 :(得分:2)
不要担心有两个字符串。这是一种微观优化。
如果你确实发现了一个瓶颈,你可以进行几乎恒定的时间修剪 - 只需迭代前N个字符,直到它们为Character.isWhitespace(c)
答案 2 :(得分:2)
我使用了Zaven的分析方法和StringBuilder的 delete(开始,结束)方法,该方法的性能远远优于 deleteCharAt(index)方法,但稍差于 substring()方法。此方法也使用数组副本,但调用数组副本的次数要少得多(在最坏的情况下只调用两次)。此外,如果在同一个StringBuilder对象上重复调用trim(),这个可以避免创建多个实例的中间字符串。
public class Main {
public static String trimOriginal(StringBuilder sb) {
return sb.toString().trim();
}
public static String trimDeleteRange(StringBuilder sb) {
int first, last;
for (first = 0; first < sb.length(); first++)
if (!Character.isWhitespace(sb.charAt(first)))
break;
for (last = sb.length(); last > first; last--)
if (!Character.isWhitespace(sb.charAt(last - 1)))
break;
if (first == last) {
sb.delete(0, sb.length());
} else {
if (last < sb.length()) {
sb.delete(last, sb.length());
}
if (first > 0) {
sb.delete(0, first);
}
}
return sb.toString();
}
public static String trimSubstring(StringBuilder sb) {
int first, last;
for (first = 0; first < sb.length(); first++)
if (!Character.isWhitespace(sb.charAt(first)))
break;
for (last = sb.length(); last > first; last--)
if (!Character.isWhitespace(sb.charAt(last - 1)))
break;
return sb.substring(first, last);
}
public static void main(String[] args) {
runAnalysis(1000);
runAnalysis(10000);
runAnalysis(100000);
runAnalysis(200000);
runAnalysis(500000);
runAnalysis(1000000);
}
private static void runAnalysis(int stringLength) {
System.out.println("Main:runAnalysis(string-length=" + stringLength + ")");
long originalTime = 0;
long deleteTime = 0;
long substringTime = 0;
for (int i = 0; i < 200; i++) {
StringBuilder temp = new StringBuilder();
char[] options = {' ', ' ', ' ', ' ', 'a', 'b', 'c', 'd'};
for (int j = 0; j < stringLength; j++) {
temp.append(options[(int) ((Math.random() * 1000)) % options.length]);
}
String testStr = temp.toString();
StringBuilder sb1 = new StringBuilder(testStr);
StringBuilder sb2 = new StringBuilder(testStr);
StringBuilder sb3 = new StringBuilder(testStr);
long timer1 = System.currentTimeMillis();
trimOriginal(sb1);
originalTime += System.currentTimeMillis() - timer1;
long timer2 = System.currentTimeMillis();
trimDeleteRange(sb2);
deleteTime += System.currentTimeMillis() - timer2;
long timer3 = System.currentTimeMillis();
trimSubstring(sb3);
substringTime += System.currentTimeMillis() - timer3;
}
System.out.println(" original: " + originalTime + " ms");
System.out.println(" delete-range: " + deleteTime + " ms");
System.out.println(" substring: " + substringTime + " ms");
}
}
输出:
Main:runAnalysis(string-length=1000)
original: 0 ms
delete-range: 4 ms
substring: 0 ms
Main:runAnalysis(string-length=10000)
original: 4 ms
delete-range: 9 ms
substring: 4 ms
Main:runAnalysis(string-length=100000)
original: 22 ms
delete-range: 33 ms
substring: 43 ms
Main:runAnalysis(string-length=200000)
original: 57 ms
delete-range: 93 ms
substring: 110 ms
Main:runAnalysis(string-length=500000)
original: 266 ms
delete-range: 220 ms
substring: 191 ms
Main:runAnalysis(string-length=1000000)
original: 479 ms
delete-range: 467 ms
substring: 426 ms
答案 3 :(得分:1)
只有你们中的一个人考虑到当你将“字符串”构建器转换为“字符串”然后“修剪”时,你创建了一个必须被垃圾收集的两次不可变对象, 所以总分配是:
因此虽然它可能“出现”修剪速度更快,但在现实世界中,加载内存方案实际上会更糟。
答案 4 :(得分:1)
我一开始就有你的问题,然而,经过5分钟的第二次思考,我意识到你实际上永远不需要修剪StringBuffer!您只需修剪附加到StringBuffer中的字符串。
如果要修剪初始StringBuffer,可以这样做:
StringBuffer sb = new StringBuffer(initialStr.trim());
如果你想在运行中修剪StringBuffer,可以在追加期间执行此操作:
Sb.append(addOnStr.trim());
答案 5 :(得分:0)
你得到两个字符串,但我希望数据只分配一次。由于Java中的字符串是不可变的,我希望trim实现为您提供一个共享相同字符数据但具有不同开始和结束索引的对象。至少这是substr方法的作用。所以,你试图优化它的任何东西肯定会产生相反的效果,因为你增加了不需要的开销。
只需使用调试器逐步执行trim()方法。
答案 6 :(得分:0)
我做了一些代码。它有效,测试用例可供您查看。如果可以,请告诉我。
主要代码 -
public static StringBuilder trimStringBuilderSpaces(StringBuilder sb) {
int len = sb.length();
if (len > 0) {
int start = 0;
int end = 1;
char space = ' ';
int i = 0;
// Remove spaces at start
for (i = 0; i < len; i++) {
if (sb.charAt(i) != space) {
break;
}
}
end = i;
//System.out.println("s = " + start + ", e = " + end);
sb.delete(start, end);
// Remove the ending spaces
len = sb.length();
if (len > 1) {
for (i = len - 1; i > 0; i--) {
if (sb.charAt(i) != space) {
i = i + 1;
break;
}
}
start = i;
end = len;// or len + any positive number !
//System.out.println("s = " + start + ", e = " + end);
sb.delete(start, end);
}
}
return sb;
}
包含测试的完整代码 -
package source;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.ArrayList;
public class StringBuilderTrim {
public static void main(String[] args) {
testCode();
}
public static void testCode() {
StringBuilder s1 = new StringBuilder("");
StringBuilder s2 = new StringBuilder(" ");
StringBuilder s3 = new StringBuilder(" ");
StringBuilder s4 = new StringBuilder(" 123");
StringBuilder s5 = new StringBuilder(" 123");
StringBuilder s6 = new StringBuilder("1");
StringBuilder s7 = new StringBuilder("123 ");
StringBuilder s8 = new StringBuilder("123 ");
StringBuilder s9 = new StringBuilder(" 123 ");
StringBuilder s10 = new StringBuilder(" 123 ");
/*
* Using a rough form of TDD here. Initially, one one test input
* "test case" was added and rest were commented. Write no code for the
* method being tested. So, the test will fail. Write just enough code
* to make it pass. Then, enable the next test. Repeat !!!
*/
ArrayList<StringBuilder> ins = new ArrayList<StringBuilder>();
ins.add(s1);
ins.add(s2);
ins.add(s3);
ins.add(s4);
ins.add(s5);
ins.add(s6);
ins.add(s7);
ins.add(s8);
ins.add(s9);
ins.add(s10);
// Run test
for (StringBuilder sb : ins) {
System.out
.println("\n\n---------------------------------------------");
String expected = sb.toString().trim();
String result = trimStringBuilderSpaces(sb).toString();
System.out.println("In [" + sb + "]" + ", Expected [" + expected
+ "]" + ", Out [" + result + "]");
if (result.equals(expected)) {
System.out.println("Success!");
} else {
System.out.println("FAILED!");
}
System.out.println("---------------------------------------------");
}
}
public static StringBuilder trimStringBuilderSpaces(StringBuilder inputSb) {
StringBuilder sb = new StringBuilder(inputSb);
int len = sb.length();
if (len > 0) {
try {
int start = 0;
int end = 1;
char space = ' ';
int i = 0;
// Remove spaces at start
for (i = 0; i < len; i++) {
if (sb.charAt(i) != space) {
break;
}
}
end = i;
//System.out.println("s = " + start + ", e = " + end);
sb.delete(start, end);
// Remove the ending spaces
len = sb.length();
if (len > 1) {
for (i = len - 1; i > 0; i--) {
if (sb.charAt(i) != space) {
i = i + 1;
break;
}
}
start = i;
end = len;// or len + any positive number !
//System.out.println("s = " + start + ", e = " + end);
sb.delete(start, end);
}
} catch (Exception ex) {
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
ex.printStackTrace(pw);
sw.toString(); // stack trace as a string
sb = new StringBuilder("\nNo Out due to error:\n" + "\n" + sw);
return sb;
}
}
return sb;
}
}
答案 7 :(得分:0)
strBuilder.replace(0,strBuilder.length(),strBuilder.toString().trim());
答案 8 :(得分:0)
由于 deleteCharAt() 每次都复制数组,因此我提供了以下代码,在最坏的情况下,当 StringBuilder 同时具有前导和尾随空格时,该代码会复制数组两次。下面的代码将确保对象引用保持不变,并且我们不会创建新对象。
public static void trimStringBuilder(StringBuilder builder) {
int len = builder.length();
int start = 0;
// Remove whitespace from start
while (start < len && builder.charAt(start) == ' ') {
start++;
}
if (start > 0) {
builder.delete(0, start);
}
len = builder.length();
int end = len;
// Remove whitespace from end
while (end > 0 && builder.charAt(end - 1) == ' ') {
end--;
}
if (end < len) {
builder.delete(end, len);
}
}