我正在寻找一种方法来匹配部分(或全部)先前匹配的组。例如,假设我们有以下文本:
this is a very long text "with" some quoted strings I "need" to match in their own context
像(.{1,20})(".*?")(.{1,20})
这样的正则表达式给出以下输出:
# | 1st group | 2nd group | 3rd group
------------------------------------------------------------------
1 | is a very long text | "with" | some quoted strings
2 | I | "need" | to extract in their
目标是强制正则表达式从第一个匹配项中重新匹配第三组的一部分-当引用的字符串非常接近时则匹配整个匹配项-当与第二个匹配时。基本上,我希望有以下输出:
# | 1st group | 2nd group | 3rd group
------------------------------------------------------------------
1 | is a very long text | "with" | some quoted strings
2 | me quoted strings I | "need" | to extract in their
可能,反向引用支持可以解决这个问题,但是正则表达式引擎却缺乏它。
答案 0 :(得分:2)
如果回到原始问题,则需要在上下文中提取引号。
由于您没有前瞻性,因此可以使用regexp来匹配引号(甚至只匹配string.Index),并只获取字节范围,然后通过扩展范围来扩展为包括上下文(这可能需要更多的工作)如果处理复杂的UTF字符串)。
类似:
input := `this is a very long text "with" some quoted strings I "need" to extract in their own context`
re := regexp.MustCompile(`(".*?")`)
matches := re.FindAllStringIndex(input, -1)
for _, m := range matches {
s := m[0] - 20
e := m[1] + 20
if s < 0 {
s = 0
}
if e >= len(input) {
e = -1
}
fmt.Printf("%s\n", input[s:e])
}