Question

我的程序目前存在内存问题，在检查应用程序时，我们发现String.split()方法占用了大量内存。我尝试过使用StreamTokenizer，但这似乎让事情变得更加复杂。

是否有更好的方法可以将长Strings拆分为比Strings方法使用更少内存的小String.split()？

Answer 1

分裂的任何实际使用都不太可能“消耗大量内存”。您的输入必须是巨大的（很多，很多兆字节），并且您的结果会分成数百万的部分，甚至会被注意到。

这是一些代码，它创建一个大约180万个字符的随机字符串，并将其拆分为超过100万个字符串，并输出使用的内存和所用的时间。

正如您所看到的，它并不多：仅仅350毫秒就消耗了61Mb。

public static void main(String[] args) throws Exception {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 99999; i++) {
        sb.append(Math.random());
    }
    long begin = System.currentTimeMillis();
    String string = sb.toString();
    sb = null;
    System.gc();
    long startFreeMem = Runtime.getRuntime().freeMemory();
    String[] strings = string.split("(?=[0-5])");
    long endFreeMem = Runtime.getRuntime().freeMemory();
    long execution = System.currentTimeMillis() - begin;

    System.out.println("input length = " + string.length() + "\nnumber of strings after split = " + strings.length + "\nmemory consumed due to split = "
            + (startFreeMem - endFreeMem) + "\nexecution time = " + execution + "ms");
}

输出（在相当典型的Windows框中运行）：

input length = 1827035
number of strings after split = 1072788
memory consumed due to split = 71740240
execution time = 351ms

有趣的是，没有 System.gc()使用的内存大约是1/3：

memory consumed due to split = 29582328

Answer 2

Split不会创建全新的字符串，它会在内部使用substring创建一个新的String对象，该对象指向原始字符串的右子字符串，而不复制基础char[]。

除了对象创建的（轻微）开销之外，从内存的角度来看，它不应该产生巨大的影响。

ps：StringTokenizer使用相同的技术，因此它可能产生与拆分相同的结果。

编辑

要查看是这种情况，您可以使用下面的示例代码。它将abc,def拆分为abc和def，然后打印原始字符串和拆分字符串的基础char[] - 输出显示它们都是相同的。

输出：

Reference: [C@3590ed52  Content: [a, b, c, ,, d, e, f]
Reference: [C@3590ed52  Content: [a, b, c, ,, d, e, f]
Reference: [C@3590ed52  Content: [a, b, c, ,, d, e, f]

代码：

public static void main(String[] args) throws InterruptedException, NoSuchFieldException, IllegalArgumentException, IllegalAccessException {
    String s = "abc,def";
    String[] ss = s.split(",");
    Field f = String.class.getDeclaredField("value");
    f.setAccessible(true);
    System.out.println("Reference: " + f.get(s) + "\tContent: " + Arrays.toString((char[])f.get(s)));
    System.out.println("Reference: " + f.get(ss[0]) + "\tContent: " + Arrays.toString((char[])f.get(ss[0])));
    System.out.println("Reference: " + f.get(ss[1]) + "\tContent: " + Arrays.toString((char[])f.get(ss[1])));
}

Answer 3

如果你只想使用长字符串的一个或几个数组，可以分割可能的方面内存。长字符串将始终在内存中。像

private static List<String> headlist = new ArrayList<String>();

String longstring = ".....";
headlist.add(longstring.split(" ")[0]);

比longstring总是在内存中。 JVM无法gc它。

在这种情况下，我想也许你可以试试

private static List<String> headlist = new ArrayList<String>();

String longstring = ".....";
headlist.add(new String(longstring.split(" ")[0]));

如下面的代码

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class SplitTest {
    static Random rand = new Random();
    static List<String> head = new ArrayList<String>();

    /**
     * @param args
     */
    public static void main(String[] args) {
        while(true) {
            String a = constructLongString();
            head.add(a.split(" ")[0]); //1
            //head.add(new String(a.split(" ")[0])); //2
            if (i % 1000 == 0)
                System.out.println("" + i);
            System.gc();
        }
    }

    private static String constructLongString() {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 10; i++) {
            sb.append(rand.nextInt(10));
        }
        sb.append(" ");
        for (int i = 0; i < 4096; i++) {
            sb.append(rand.nextInt(10));
        }
        return sb.toString();
    }
}

如果你使用-Xmx60M运行，那么它将超过6000+ 如果您使用代码行2，请注释第1行，然后运行时间超过6000

Answer 4

您需要使用某种流式阅读器，而不是滥用大数据字符串的内存。这里有一些例子：

 public static void readString(String str) throws IOException {
        InputStream is = new ByteArrayInputStream(str.getBytes("UTF-8"));

        char[] buf = new char[2048];
        Reader r = new InputStreamReader(is, "UTF-8");

        while (true) {
            int n = r.read(buf);
            if (n < 0)
                break;

            /*
             StringBuilder s = new StringBuilder();
             s.append(buf, 0, n);
             ... now you can parse the StringBuilder ...  
            */
        }
    }

String.split（）的内存问题

4 个答案: