匹配除“>”之外的任何字符,除非前面有'%`?

时间:2017-09-16 13:37:28

标签: java regex

匹配任何字符串的最快方式(处理时间)是什么,除非它包含>?但是,如果>前面有%%>),那就不错了。

我想匹配" dhg87y93..r,y9w"," dhkajdah%> daadas%>" ," adsdsa%> / r / n (换行符)%>",甚至""但不是" adhajs> dadsadas"。

我尝试了([^>]*(%>)?[^>]*)*,但即使工作也需要太多的处理能力。

谢谢!

4 个答案:

答案 0 :(得分:3)

假装%>是单个字符:

([^>]|%>)*

答案 1 :(得分:3)

试试^([^%>]|%>?)*$|之前的部分与%>之外的所有内容匹配。第二部分与%匹配,然后允许>之后。

答案 2 :(得分:3)

您可以使用此正则表达式:

// anchors are implicit in String.matches(regex)

boolean isMatch = str.matches("[^>]*(?:%>[^>]*)*");

RegEx Demo

Java代码:

^                # start (Implicit in matches())
[^>]*            # match zero or more of any character except >
(?:              # start of non-capture group
   %>            # match %> 
   |             # OR
   [^>]*         # match zero or more of any character except >
)*               # end of non-capture group. Match zero or more of this group
$                # end (Implicit in matches())

完成的总步骤数: 85

RegEx分手:

Type1 field1
Type2 field2
Type3 field3

答案 3 :(得分:1)

我已经添加了这个答案,谨慎的人们相信像#34这样的答案;这是最快的正则表达式......"。而且,不是,regex101等网站提供的步骤数量是一个指示,但不是绝对数字,可以保证与特定正则表达式匹配的速度。我已经整理了一个使用问题中所有示例和所有正则表达式答案的临时文件。

import java.util.ArrayList;
import java.util.List;

public class scratch_5 {
    public static void main(String[] args) {

        List<String> tests = new ArrayList<String>() {{
            add("dhg87y93..r,y9w");
            add("dhkajdah%>daadas%>");
            add("adsdsa %>/r/n%>");
            add("but not \"adhajs>dadsadas");
        }};
        List<String> patterns = new ArrayList<String>() {{
            add("([^%>]|%>?)*");      // Leo Aso
            add("[^>]*(?:%>[^>]*)*"); // anubhava
            add("([^>]|%>)*");        // John Kugelman
        }};

        int i = 0;
        for (String test : tests) {
            System.out.println("string " + test);
            System.out.println(new String(new char[test.length() + 7]).replace("\0", "="));
            for (String pattern : patterns) {
                long startTime = System.nanoTime();
                boolean res = test.matches(pattern);
                long endTime = System.nanoTime();
                long duration = (endTime - startTime);
                System.out.format("with pattern %d: %s with duration %TQ\n", (i++ % 3) + 1, res, duration);
            }
            System.out.println();
        }
}

运行此命令:

string dhg87y93..r,y9w
======================
with pattern 1: true with duration 584676
with pattern 2: true with duration 45438
with pattern 3: true with duration 36220

string dhkajdah%>daadas%>
=========================
with pattern 1: true with duration 56894
with pattern 2: true with duration 59195
with pattern 3: true with duration 73102

string adsdsa %>/r/n%>
======================
with pattern 1: true with duration 63597
with pattern 2: true with duration 49039
with pattern 3: true with duration 34486

string but not "adhajs>dadsadas
===============================
with pattern 1: false with duration 58285
with pattern 2: false with duration 39279
with pattern 3: false with duration 42053

我们需要忽略第一次测试的长度第一个结果,这是由初始化引起的。我们可以得出结论 - 平均而言 - 第二个正则表达式是最快的,但情况并非总是如此。它表明它取决于你匹配的字符串,哪个字符串会更快。因此,问题的正确答案是:取决于

为了绝对确定使用正则表达式解析特定字符串的速度与其他字符串相比,您应该知道解析器正在使用的策略。

附录1 :如果您编译模式,甚至会出现不同的结果。

    Matcher matcher;
    Pattern cp;
    for (String test : tests) {
        System.out.println("string " + test);
        System.out.println(new String(new char[test.length() + 7]).replace("\0", "="));
        for (String pattern : patterns) {
            cp = Pattern.compile(pattern);
            long startTime = System.nanoTime();
            matcher = cp.matcher(test);
            long endTime = System.nanoTime();
            long duration = (endTime - startTime);
            System.out.format("with pattern %d: %s with duration %TQ\n", (i++ % 3) + 1, matcher.find(), duration);
        }
        System.out.println();
    }

结果是:

string dhg87y93..r,y9w
======================
with pattern 1: true with duration 39342
with pattern 2: true with duration 2296
with pattern 3: true with duration 1520

string dhkajdah%>daadas%>
=========================
with pattern 1: true with duration 2365
with pattern 2: true with duration 2428
with pattern 3: true with duration 2452

string adsdsa %>/r/n%>
======================
with pattern 1: true with duration 2449
with pattern 2: true with duration 2147
with pattern 3: true with duration 1505

string but not "adhajs>dadsadas
===============================
with pattern 1: true with duration 1663
with pattern 2: true with duration 1569
with pattern 3: true with duration 2003

有一件事是清楚的:如果你需要加快速度,请编译你的模式。我想,这就像向合唱团讲道一样。 ; - )