Question

给出一个测试字符串：

I have a 1234 and a 2345 and maybe a 3456 id.

我想匹配所有ID（四位数字）并同时获取其周围文本的12个字符（之前和之后）（如果有的话）

所以比赛应该是：

             BEFORE       MATCH      AFTER
Match #1:   I have a-      1234    -and a 2345-
Match #2:   -1234 and a-   2345    -and maybe a
Match #3:   and maybe a-   3456    -id.

This (-) is a space character

注意：

Match＃1的BEFORE匹配长度不是12个字符（字符串开头的字符数不多）。与第3场比赛的AFTER比赛相同（最后一场比赛后的字数不多）

我能用java中的单个正则表达式实现这些匹配吗？

到目前为止，我最好的尝试是使用背后的正面外观和一个原子组（以获取周围的文本）但是当字符串不足时它会在字符串的开头和结尾处失败（例如我的注意上面）

(?<=(.{12}))(\d{4})(?>(.{12}))

这只匹配2345.如果我对量词使用足够小的值（例如2而不是12），那么我正确匹配所有ID。

这是我正在尝试我的正则表达式的正则表达式游乐场的链接：

http://regex101.com/r/cZ6wG4

Answer 1

当您查看Matcher类（http://docs.oracle.com/javase/7/docs/api/java/util/regex/MatchResult.html）实现的MatchResult（http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html）接口时，您会发现函数start()和end()，它们为您提供索引输入字符串中匹配的第一个/最后一个字符。获得指标后，您可以使用一些简单的数学运算和子字符串函数来提取所需的部分。

我希望这会对你有所帮助，因为我不会为你编写完整的代码。

可能有可能纯粹使用正则表达式执行您想要的操作。但我认为使用标记和子字符串更容易（并且可能更可靠）

Answer 2

您可以在单个正则表达式中执行此操作：

Pattern regex = Pattern.compile("(?<=^.{0,10000}?(.{0,12}))(\\d+)(?=(.{0,12}))");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    before = regexMatcher.group(1);
    match = regexMatcher.group(2);
    after = regexMatcher.group(3);
}

<强>解释

(?<=          # Assert that the following can be matched before current position
 ^.{0,10000}? # Match as few characters as possible from the start of the string
 (.{0,12})    # Match and capture up to 12 chars in group 1
)             # End of lookbehind
(\d+)         # Match and capture in group 2: Any number
(?=           # Assert that the following can be matched here:
 (.*)         # Match and capture up to 12 chars in group 3
)             # End of lookahead

Answer 3

你不需要一个lookbehind或一个原子组，但你确实需要一个先行：

(.{0,12}?)\b(\d+)\b(?=(.{0,12}))

我假设您的ID没有用更长的单词括起来（因此\b）。我在前导部分（{0,12}?）中使用了一个不情愿的量词，以防止它们彼此间隔时占用多个ID，并在：

I have a 1234, 2345 and 1456 id.

如何在java中捕获多个正则表达式匹配之前和之后的文本？

注意：

3 个答案: