如果<正则表达式第N次匹配或最后匹配N匹配

时间:2011-06-10 16:55:39

标签: regex

我正在尝试找到第n个匹配,或者如果少于n则找到最后一个匹配。 n在我的程序中确定,正则表达式字符串由'n'替换为整数构成。

这是我最好的猜测,但我的重复运算符{1,n}总是匹配一次。我认为默认情况下会贪婪

The basic regex would be:
distinctiveString[\s\S]*?value="([^"]*)"

So I modified it to this to try to get the nth one instead
(?:distinctiveString[\s\S]*?){1,n}value="([^"]*)"

distinctiveString randomStuff value="val1"
moreRandomStuff
distinctiveString randomStuff value="val2"
moreRandomStuff
distinctiveString randomStuff value="val3"
moreRandomStuff
distinctiveString randomStuff value="val4"
moreRandomStuff
distinctiveString randomStuff value="val5"

所以在这种情况下,我想要的是n = 2我得到'val2',n = 5我得到'val5',n = 8我也会得到'val5'。

我正在通过应用程序层传递我的正则表达式,但我认为它是按原样直接传递给Perl。

1 个答案:

答案 0 :(得分:2)

尝试这样的事情:

(?:(?:[\s\S]*?distinctiveString){4}[\s\S]*?|(?:[\s\S]*distinctiveString)[\s\S]*?)value="([^"]*)"

在匹配组1中有"val4"或输入为"val3"

distinctiveString randomStuff value="val1"
moreRandomStuff
distinctiveString randomStuff value="val2"
moreRandomStuff
distinctiveString randomStuff value="val3"

快速分解模式:

(?:                                         #
  (?:[\s\S]*?distinctiveString){4}[\s\S]*?  # match 4 'distinctiveString's
  |                                         # OR
  (?:[\s\S]*distinctiveString)[\s\S]*?      # match the last 'distinctiveString'
)                                           #
value="([^"]*)"                             #

通过查看您的个人资料,您似乎最活跃于Java标记,所以这里是一个小型Java演示:

import java.util.regex.*;

public class Main {

    private static String getNthMatch(int n, String text, String distinctive) {
        String regex = String.format(
                "(?xs)                 # enable comments and dot-all           \n" +
                "(?:                   # start non-capturing group 1           \n" +
                "  (?:.*?%s){%d}       #   match n 'distinctive' strings       \n" +
                "  |                   #   OR                                  \n" +
                "  (?:.*%s)            #   match the last 'distinctive' string \n" +
                ")                     # end non-capturing group 1             \n" +
                ".*?value=\"([^\"]*)\" # match the value                       \n",
                distinctive, n, distinctive
        );
        Matcher m = Pattern.compile(regex).matcher(text);
        return m.find() ? m.group(1) : null;
    }

    public static void main(String[] args) throws Exception {
        String text = "distinctiveString randomStuff value=\"val1\" \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val2\"       \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val3\"       \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val4\"       \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val5\"         ";

        String distinctive = "distinctiveString";

        System.out.println(getNthMatch(4, text, distinctive));
        System.out.println(getNthMatch(5, text, distinctive));
        System.out.println(getNthMatch(6, text, distinctive));
        System.out.println(getNthMatch(7, text, distinctive));
    }
}

将打印以下内容到控制台:

val4
val5
val5
val5

请注意,启用全部点选项(.)后,[\s\S](?s)的匹配程度相同。

修改

是的,{1,n} 贪婪。但是,当您在[\s\S]*?中放置{/ 1>} 之后的distinctiveString 时,(?:distinctiveString[\s\S]*?){1,3}匹配,然后不情愿地为零或更多字符(因此将为零)然后重复1到3次。你想要做的是在 distinctiveString之前移动[\s\S]*?

distinctiveString

还打印:

val4
val5
val5
val5