Question

根据KathySierra的SCJP学习指南：

java.lang.StringBuffer和java.lang.StringBuilder类应该当你必须修改字符串时使用。正如我们所讨论的，String对象是不可变的，所以如果你选择这样做使用String对象进行大量操作时，最终会遇到很多问题 String池中已放弃的String对象

为了清除这一点，我已经完成了String类和StringBuilder source here的代码。

String的simplfied代码如下所示：

public final class String(){
     private final char [] value; //Final helps in immutability, I guess.
     public String (String original){
         value = original.value;
      }
}

StringBuilder的简化版本如下所示：

public final class StringBuilder{
    char [] value;
    public StringBuilder(String str) {
        value = new Char[(str.length() + 16)]; // 16 here is implementation dependent.
    append(str);
}

public StringBuilder append(String str){

            //Add 'str' characters in value array if its size allows,
        else
            // Create new array of size newCapacity and copy contents of 'value' in that.
            //value = Arrays.copyOf(value, newCapacity);// here old array object is lost.

        return this;
    }
}

所以我们假设我们有一个案例如下：

使用String类：

String s1 = "abc"; // Creates one object on String pool.
s1 = s1+"def"; // Creates two objects - "def " on String pool
// and "abcdef" on the heap.

如果我使用StringBuilder，代码将变为：

 StringBuilder s1 = StringBuilder("abc");

 // Creates one String object "abc " on String pool.
 // And one StringBuilder object "abc" on the heap.
 s1.append("def");
 // Creates one string object "def" on String pool.
 // And changes the char [] inside StringBuilder to hold "def" also.

在StringBuilder s2 = s1.append("def");中，持有字符串的char数组有可能被新的char数组替换。现在，旧数组的引用较少，将被垃圾收集。

我的查询是：

使用简单的字符串连接和StringBuilder append()方法，转换到字符串池的String个对象的数量是相同的。

根据上面列出的代码，StringBuilder确实首先使用了更大的char数组，而String对象包含一个与它所持有的字符串长度相同的char数组。

如何StringBuilder的使用效率高于正常水平字符串操作的String类？
SCJP Guide中的陈述是错误的吗？

Answer 1

关键是expandCapacity功能：

void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2; //important part here
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }
    value = Arrays.copyOf(value, newCapacity);
}

每次需要调整大小时，通过将基础数组的大小调整为2，将附加1个字符所需的amortized time最小化。

Wikipedia有一个很好的解释：

当插入n个元素时，容量形成几何级数。以任何恒定比例扩展数组可确保插入n个元素总体上花费O（n）时间，这意味着每个插入都需要按时间分摊。该比例a的值导致时空权衡：每次插入操作的平均时间约为a /（a-1），而浪费的细胞数量高于（a-1）n。 a的选择取决于库或应用程序：一些教科书使用a = 2，但Java的ArrayList实现使用a = 3/2而Python的列表数据结构的C实现使用a = 9/8。

如果大小低于某个阈值（例如容量的30％），许多动态数组也会释放一些底层存储。该阈值必须严格小于1 / a，以支持混合的插入和移除序列以及摊销的固定成本。

动态数组是教授摊销分析的常见例子。

现在，对于您的特定示例，它不会产生任何影响，但是当您附加大量字符时，您会看到效果：

public static void main(String[] args){
    int numAppends = 200000;
    int numRepetitions = 3;
    long[] time1 = new long[numRepetitions];
    long[] time2 = new long[numRepetitions];
    long now;
    for (int k = 0; k < numRepetitions; k++){
        String s = "";
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            s = s + "a";
        }
        time1[k] = (System.nanoTime() - now) / 1000000;
        StringBuilder sb = new StringBuilder();
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            sb.append("a");     
        }
        time2[k] = (System.nanoTime() - now) / 1000000;
        System.out.println("Rep "+k+", time1: "+time1[k]+ " ms, time2: " + time2[k] + " ms");
    }
}

输出：

Rep 0, time1: 13509 ms, time2: 7 ms
Rep 1, time1: 13086 ms, time2: 1 ms
Rep 2, time1: 13167 ms, time2: 1 ms

此外，我计算了Arrays.copyOf()方法在extendCapacity()内为基准调用的次数：第一次迭代时为49次，但仅为15次第二次和第三次迭代。输出如下：

newCapacity: 34
newCapacity: 70
newCapacity: 142
newCapacity: 286
newCapacity: 574
newCapacity: 1150
newCapacity: 2302
newCapacity: 4606
newCapacity: 9214
newCapacity: 18430
newCapacity: 36862
newCapacity: 73726
newCapacity: 147454
newCapacity: 294910
newCapacity: 42
Rep 2, time1: 12955 ms, time2: 134 ms

Answer 2

如果要循环创建字符串，效率会更高。如果你有一个循环：

String[] strings = { "a", "b", "c", "d" };
String result = "";
for( String s : strings) {
    result += s;
}

StringBuilder版本将生成更少的对象并导致更少的GC：

String[] strings = { "a", "b", "c", "d" };
StringBuilder builder = new StringBuilder();
for( String s : strings) {
    builder.append(s);
}

虽然第一个会导致在每次循环运行时为GC发送一个对象，但第二个不会。

最终，由于字符串构建器数组通过将其大小加倍而增长，因此不会发生很多分配。

Answer 3

操纵不仅仅是连接。想象一下，你想在String的中间插入一个字符。你会怎么做，因为字符串是不可变的？您必须创建一个新的String。使用StringBuilder，您可以insert(int offset, c)

请参阅StringBuilder javadoc

你有像

这样的方法

delete(int start, int end)
// Removes the characters in a substring of this sequence.

replace(int start, int end, String str)
// Replaces the characters in a substring of this sequence with characters in the specified String.

reverse()
// Causes this character sequence to be replaced by the reverse of the sequence.

insert(int dstOffset, CharSequence s)
// Inserts the specified CharSequence into this sequence.

Answer 4

对于字符串操作，StringBuilder的使用如何比普通的String类更有效？

当你在循环中执行许多操作时，更有效率。考虑需要迭代单个字符的任何字符串转换或替换函数，例如这个字符用于转义XML或HTML的<, >, &, ", '个字符：

public static String xmlEscape(String s) {
    StringBuilder sb = new StringBuilder(
        (int)Math.min(Integer.MAX_VALUE, s.length() * 5L / 4));
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c == '<') sb.append("&lt;");
        else if (c == '>') sb.append("&gt;");
        else if (c == '&') sb.append("&amp;");
        else if (c == '"') sb.append("&quot;");
        else if (c == '\'') sb.append("&#039;");
        else sb.append(c);
    }
    return sb.toString();
}

最初调整StringBuilder数组的大小，其容量比输入字符串大一些，以便容纳原始文本和可能的替换。输出文本在预分配的缓冲区中累积，并且在循环期间很可能不需要任何额外的内存分配。

如果上面的函数在String而不是StringBuilder中累积了输出，那么每次处理单个字符时它都会再次复制整个输出，将其降级为二次（即，太糟糕了！）表现。

第二个问题：

SCJP指南中的陈述是错误的吗？

说实话，是的。说在字符串池中存在“废弃的String对象”是极其误导的。据我所知，术语“字符串池”仅指实习池，例如String.intern()方法使用的术语。 Strings自动放入实习池的唯一时间是ClassLoader加载一个类并将源代码中的String文字常量加载到内存中。

在运行时操作String对象当然不会在实习池中放置额外的对象（除非您故意调用.intern()）。

SCJP指南应该说的是：

字符串对象是不可变的，因此如果您选择使用String对象进行大量操作，最终会在堆中使用大量弃用的String对象。

堆上的废弃物不是最大的问题，因为垃圾收集器会很快吃掉它们。在进行多次操作时使用StringBuilders的真正原因是为了避免首先对字符进行不必要的复制。如@jmiserez'基准测试中所示，这对性能产生了巨大影响。

根据内存使用情况比较String和StringBuilder操作

4 个答案: