重新匹配相同或先前匹配组的一部分

时间:2018-09-17 12:39:16

标签: regex go

我正在寻找一种方法来匹配部分(或全部)先前匹配的组。例如,假设我们有以下文本:

this is a very long text "with" some quoted strings I "need" to match in their own context

(.{1,20})(".*?")(.{1,20})这样的正则表达式给出以下输出:

# | 1st group           |   2nd group   |   3rd group
------------------------------------------------------------------
1 | is a very long text |   "with"      |   some quoted strings
2 | I                   |   "need"      |   to extract in their

目标是强制正则表达式从第一个匹配项中重新匹配第三组的一部分-当引用的字符串非常接近时则匹配整个匹配项-当与第二个匹配时。基本上,我希望有以下输出:

# | 1st group           |   2nd group   |   3rd group
------------------------------------------------------------------
1 | is a very long text |   "with"      |   some quoted strings
2 | me quoted strings I |   "need"      |   to extract in their

可能,反向引用支持可以解决这个问题,但是正则表达式引擎却缺乏它。

1 个答案:

答案 0 :(得分:2)

如果回到原始问题,则需要在上下文中提取引号。

由于您没有前瞻性,因此可以使用regexp来匹配引号(甚至只匹配string.Index),并只获取字节范围,然后通过扩展范围来扩展为包括上下文(这可能需要更多的工作)如果处理复杂的UTF字符串)。

类似:

input := `this is a very long text "with" some quoted strings I "need" to extract in their own context`

re := regexp.MustCompile(`(".*?")`)

matches := re.FindAllStringIndex(input, -1)

for _, m := range matches {
    s := m[0] - 20
    e := m[1] + 20
    if s < 0 {
        s = 0
    }
    if e >= len(input) {
        e = -1
    }
    fmt.Printf("%s\n", input[s:e])
}

https://play.golang.org/p/brH8v6OM-Fx