Question

提前致谢。

我刚刚解决了Project Euler #22，这个问题涉及从文件中读取大约5,000行文本，并根据字符串字符的总和以及字母顺序确定特定名称的值。

但是，代码运行大约需要5-10秒，这有点烦人。优化此代码的最佳方法是什么？我目前正在使用扫描仪将文件读入字符串。还有另一种更有效的方法吗？（我尝试使用BufferedReader，但这甚至更慢）

public static int P22(){


    String s = null;

    try{
        //create a new Scanner to read file
        Scanner in = new Scanner(new File("names.txt"));
        while(in.hasNext()){
            //add the next line to the string
            s+=in.next();
        }

    }catch(Exception e){

    }
    //this just filters out the quotation marks surrounding all the names
    String r = "";
    for(int i = 0;i<s.length();i++){
        if(s.charAt(i) != '"'){
            r += s.charAt(i);
        }
    }
    //splits the string into an array, using the commas separating each name
    String text[] = r.split(",");
    Arrays.sort(text);



    int solution = 0;
    //go through each string in the array, summing its characters
    for(int i = 0;i<text.length;i++){
        int sum = 0;
        String name = text[i];
        for(int j = 0;j<name.length();j++){
            sum += (int)name.charAt(j)-64;
        }
        solution += sum*(i+1);
    }
    return solution;


}

Answer 1

如果您要使用Scanner，为什么不将它用于它应该做的事情（标记化）？

  Scanner in = new Scanner(new File("names.txt")).useDelimiter("[\",]+");
  ArrayList<String> text = new ArrayList<String>();
  while (in.hasNext()) {
    text.add(in.next());
  }
  Collections.sort(text);

您不需要删除引号，也不需要用逗号分隔 - Scanner为您完成所有操作。

此片段（包括java启动时间）在我的计算机上以0.625秒（用户时间）执行。我怀疑它应该比你正在做的快一点。

编辑 OP询问传递给useDelimiter的字符串是什么。这是一个regular expression。当您去掉Java所需的转义以将引号字符包含在字符串中时，它是[",]+ - 意思是：

[...]   character class: match any of these characters, so
[",]    match a quote or a comma
...+    one or more occurence modifier, so
[",]+   match one or more of quotes or commas

与此模式匹配的序列包括：

"
,
,,,,
""",,,",","

确实","，我们在这里的目标是什么。

Answer 2

使用'+'在字符串中追加字符串，就像在此处一样：

/* That's actually not the problem since there is only one line. */
while(in.hasNext()){
    //add the next line to the string
    s+=in.next();
}

很慢，因为它必须创建一个新字符串并在每次迭代中复制所有内容。尝试使用StringBuilder，

StringBuilder sb = new StringBuilder();
while(in.hasNext()){
    sb.append(in.next());
}
s = sb.toString();

但是，您不应该将文件内容真正读入String，您应该直接从文件内容中创建String[]或ArrayList<String>，

int names = 5000; // use the correct number of lines in the file!
String[] sa = new String[names];
for(int i = 0; i < names; ++i){
    sa[i] = in.next();
}

然而，经过检查，结果证明该文件不包含大约5000行，而是全部在一行，所以你的大问题实际上是

/* This one is the problem! */
String r = "";
for(int i = 0;i<s.length();i++){
    if(s.charAt(i) != '"'){
        r += s.charAt(i);
    }
}

使用StringBuilder。或者，将Scanner读到下一个'，'并直接阅读ArrayList<String>，然后从ArrayList中的每个单一名称中删除双引号。

Answer 3

我建议您使用分析器运行代码。它允许您了解哪些部分非常慢（IO /计算等）。如果IO很慢，请检查NIO：http://docs.oracle.com/javase/1.4.2/docs/guide/nio/。

Answer 4

对于这个问题，5秒以上是非常慢的。我的整个Web应用程序（600个Java类）在四秒内编译。您的问题的根源可能是为文件中的每个字符分配一个新字符串：r += s.charAt(i)

要真正加快速度，你根本不应该使用Strings。获取文件大小，并在单个I / O调用中将整个内容读入字节数组：

public class Names {
  private byte[] data;
  private class Name implements Comparable<Name> {
    private int start; // index into data
    private int length;
    public Name(int start, int length) { ...; }
    public int compareTo(Name arg0) {
      ...
    }
    public int score() 
  }
  public Names(File file) throws Exception {
    data = new byte[(int) file.length()];
    new FileInputStream(file).read(data, 0, data.length);
  }
  public int score() {
    SortedSet<Name> names = new ...
    for (int i = 0; i < data.length; ++i) {
      // find limits of each name, add to the set
    }
    // Calculate total score...
  }
}

Answer 5

根据应用程序的不同，StreamTokenizer通常比Scanner快得多。可以找到比较两者的示例here和here。

附录：Euler Project 22包括获得遇到的每个令牌中的字符的校验和。自定义analyzer可以将识别和计算结合起来，而不是遍历令牌两次。结果将存储在SortedMap<String, Integer>中，以便以后迭代查找总计。

Answer 6

一个可能有趣的解决方案。

long start = System.nanoTime();
long sum = 0;
int runs = 10000;
for (int r = 0; r < runs; r++) {
    FileChannel channel = new FileInputStream("names.txt").getChannel();
    ByteBuffer bb = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
    TLongArrayList values = new TLongArrayList();

    long wordId = 0;
    int shift = 63;
    while (true) {
        int b = bb.remaining() < 1 ? ',' : bb.get();
        if (b == ',') {
            values.add(wordId);
            wordId = 0;
            shift = 63;
            if (bb.remaining() < 1) break;

        } else if (b >= 'A' && b <= 'Z') {
            shift -= 5;
            long n = b - 'A' + 1;
            wordId = (wordId | (n << shift)) + n;

        } else if (b != '"') {
            throw new AssertionError("Unexpected ch '" + (char) b + "'");
        }
    }

    values.sort();

    sum = 0;
    for (int i = 0; i < values.size(); i++) {
        long wordSum = values.get(i) & ((1 << 8) - 1);
        sum += (i + 1) * wordSum;
    }
}
long time = System.nanoTime() - start;
System.out.printf("%d took %.3f ms%n", sum, time / 1e6);

打印

XXXXXXX took 27.817 ms.

优化Project Euler＃22

6 个答案: