Question

Pre：我正在尝试使用regexp从大数组中提取不同类型的parts。此操作在AsyncTask中执行。 part.plainname是一个字符串，最多256个字符。 item_pattern看起来像"^keyword.*?$"

问题：我找到了这个方法，这会减慢一切：

public int defineItemAmount(NFItem[] parts, String item_pattern){
    System.out.println("STAMP2");
    int casecount = 0;
    for (NFItem part : parts) {
        if (testItem(part.plainname, item_pattern))
            ++casecount;
    }
    System.out.println("STAMP3");
    return casecount;
}

public boolean testItem(String testString, String item_pattern){
    Pattern p = Pattern.compile(item_pattern);
    Matcher m = p.matcher(testString);
    return m.matches();
}

只有950 parts，但它的速度非常慢：

02-25 11:34:51.773    1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP2

02-25 11:35:18.094    1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP3

仅计20秒钟。 testItem被大量使用，大约15 * parts。所以整个应用程序工作超过15分钟。虽然几乎相同的java程序（不适用于Android应用程序）在不到30秒的时间内完成。

问题：我做错了什么？为什么简单的正则表达式操作需要这么长时间？

Answer 1

您可以预编译模式：

public static int defineItemAmount(NFItem[] parts, String item_pattern){
    System.out.println("STAMP2");
    Pattern pattern = Pattern.compile(item_pattern);
    int casecount = 0;
    for (NFItem part : parts) {
        if (testItem(part.plainname, pattern))
            ++casecount;
    }
    System.out.println("STAMP3");
    return casecount;
}

public static boolean testItem(String testString, Pattern pattern){
    Matcher m = pattern.matcher(testString);
    return m.matches();
}

Answer 2

如果您要查找以关键字开头的字符串，则不需要使用matches方法使用此类模式^keyword.*?$：

首先，非贪婪的量词是无用的，可能会使正则表达式引擎无效，贪婪的量词会给你相同的结果。
由于matches方法默认是锚定的，因此不需要锚点，您可以删除它们。
你只对字符串的开头感兴趣，所以在这种情况下，lookingAt方法更合适，因为它并不关心字符串末尾发生的事情。
正如其他答案所注意到的那样，如果多次使用相同的模式，请尝试在testItem函数之外编译一次。但如果不是这样的话，根本就不能编译它。
如果keyword是文字字符串而不是子模式，请不要使用正则表达式，并使用indexOf检查关键字是否位于索引0处。

Answer 3

每次都不需要编译模式。相反，在初始化时执行一次。

但是，由于它们的普遍性，正则表达式并不快，而且它们不是设计的。如果数据足够规则，最好使用特定的字符串拆分技术。

Answer 4

正则表达式通常是慢，因为它们在构造中涉及很多内容（例如同步）。

不要在循环中调用单独的方法（这可能会阻止某些优化）。让VM 优化 for循环。使用它并检查性能：

 Pattern p = Pattern.compile(item_pattern); // compile pattern only once
  for (NFItem part : parts) {
        if (testItem(part.plainname, item_pattern))
            ++casecount;
    }
  Matcher m = p.matcher(testString);
  boolean b = m.matches();
   ...

正则表达式非常慢

4 个答案: