Question

我正在寻找一种方法来匹配部分（或全部）先前匹配的组。例如，假设我们有以下文本：

this is a very long text "with" some quoted strings I "need" to match in their own context

像(.{1,20})(".*?")(.{1,20})这样的正则表达式给出以下输出：

# | 1st group           |   2nd group   |   3rd group
------------------------------------------------------------------
1 | is a very long text |   "with"      |   some quoted strings
2 | I                   |   "need"      |   to extract in their

目标是强制正则表达式从第一个匹配项中重新匹配第三组的一部分-当引用的字符串非常接近时则匹配整个匹配项-当与第二个匹配时。基本上，我希望有以下输出：

# | 1st group           |   2nd group   |   3rd group
------------------------------------------------------------------
1 | is a very long text |   "with"      |   some quoted strings
2 | me quoted strings I |   "need"      |   to extract in their

可能，反向引用支持可以解决这个问题，但是正则表达式引擎却缺乏它。

Answer 1

如果回到原始问题，则需要在上下文中提取引号。

由于您没有前瞻性，因此可以使用regexp来匹配引号（甚至只匹配string.Index），并只获取字节范围，然后通过扩展范围来扩展为包括上下文（这可能需要更多的工作）如果处理复杂的UTF字符串）。

类似：

input := `this is a very long text "with" some quoted strings I "need" to extract in their own context`

re := regexp.MustCompile(`(".*?")`)

matches := re.FindAllStringIndex(input, -1)

for _, m := range matches {
    s := m[0] - 20
    e := m[1] + 20
    if s < 0 {
        s = 0
    }
    if e >= len(input) {
        e = -1
    }
    fmt.Printf("%s\n", input[s:e])
}

https://play.golang.org/p/brH8v6OM-Fx

重新匹配相同或先前匹配组的一部分

1 个答案: