Question

我想使用java。

在两个单词之间提取子字符串。

例如：

This is an important example about regex for my work.

我想提取“an”和“for”之间的所有内容。

到目前为止我所做的是：

String sentence = "This is an important example about regex for my work and for me";
Pattern pattern = Pattern.compile("(?<=an).*.(?=for)");
Matcher matcher = pattern.matcher(sentence);

boolean found = false;
while (matcher.find()) {
    System.out.println("I found the text: " + matcher.group().toString());
    found = true;
}
if (!found) {
    System.out.println("I didn't found the text");
}

效果很好。

但我想再做两件事

如果句子是：This is an important example about regex for my work and for me. 我想提取到第一个“for”，即important example about regex
有时我想将模式之间的字数限制为3个字，即important example about

有什么想法吗？

Answer 1

对于你的第一个问题，让它变得懒惰。您可以在量词后面加上一个问号，然后量词将尽可能地匹配。

(?<=an).*?(?=for)

我不知道最后的.对.*.中的其他\S+\s有什么好处。

对于你的第二个问题，你必须定义一个“单词”是什么。我想在这里可能只是一个非空格序列，后跟一个空格。像这样的东西

(?<=an)\s(\S+\s){3}(?=for)

并像这样重复这3次

(?<=\ban\b)\s(\S+\s){1,5}(?=\bfor\b)

确保整个单词的模式数学使用单词边界

{3}

见online here on Regexr

{1,3}将完全匹配3，最少为1，最多为3，\ban\b(.*?)\bfor\b

<强>替代：

正如dma_k在你的情况下正确陈述的那样，没有必要使用后面看并向前看。见here the Matcher documentation about groups

您可以改用捕获组。只需将要提取的部分放在括号中，它就会被放入捕获组中。

System.out.println("I found the text: " + matcher.group(1).toString());
                                                        ^

见online here on Regexr

您可以像这样访问此群组

您只有一对括号，所以很简单，只需将matcher.group(1)放入{{1}}即可访问第一个捕获组。

Answer 2

你的正则表达式是“an\\s+(.*?)\\s+for”。它提取a和忽略空格之间的所有字符（\s+）。问号意味着“贪婪”。需要防止模式.*吃掉所有内容，包括单词“for”。

Answer 3

public class SubStringBetween {

public static String subStringBetween(String sentence, String before, String after) {

    int startSub = SubStringBetween.subStringStartIndex(sentence, before);
    int stopSub = SubStringBetween.subStringEndIndex(sentence, after);

    String newWord = sentence.substring(startSub, stopSub);
    return newWord;
}

public static int subStringStartIndex(String sentence, String delimiterBeforeWord) {

    int startIndex = 0;
    String newWord = "";
    int x = 0, y = 0;

    for (int i = 0; i < sentence.length(); i++) {
        newWord = "";

        if (sentence.charAt(i) == delimiterBeforeWord.charAt(0)) {
            startIndex = i;
            for (int j = 0; j < delimiterBeforeWord.length(); j++) {
                try {
                    if (sentence.charAt(startIndex) == delimiterBeforeWord.charAt(j)) {
                        newWord = newWord + sentence.charAt(startIndex);
                    }
                    startIndex++;
                } catch (Exception e) {
                }

            }
            if (newWord.equals(delimiterBeforeWord)) {
                x = startIndex;
            }
        }
    }
    return x;
}

public static int subStringEndIndex(String sentence, String delimiterAfterWord) {

    int startIndex = 0;
    String newWord = "";
    int x = 0;

    for (int i = 0; i < sentence.length(); i++) {
        newWord = "";

        if (sentence.charAt(i) == delimiterAfterWord.charAt(0)) {
            startIndex = i;
            for (int j = 0; j < delimiterAfterWord.length(); j++) {
                try {
                    if (sentence.charAt(startIndex) == delimiterAfterWord.charAt(j)) {
                        newWord = newWord + sentence.charAt(startIndex);
                    }
                    startIndex++;
                } catch (Exception e) {
                }

            }
            if (newWord.equals(delimiterAfterWord)) {
                x = startIndex;
                x = x - delimiterAfterWord.length();
            }
        }
    }
    return x;
}

}

使用java中的regex在两个特定单词之间提取子字符串

3 个答案: