递归，自引用组（Qtax技巧），反向Qtax或平衡组

<强>简介

在输入的底部添加整数列表的想法类似于着名的数据库hack（与正则表达式无关），其中一个连接到整数表。我的原始答案使用了@Qtax技巧。当前的答案使用递归，Qtax技巧（直接或反向变化）或平衡组。

是的，有可能......有一些警告和正则表达式。

本答案中的解决方案可以作为展示一些正则表达式语法的工具，而不是实现的实际答案。
在文件的末尾，我们将粘贴一个前面带有唯一分隔符的数字列表。对于此实验，附加的字符串是:1:2:3:4:5:6:7这是一种类似于使用整数表的着名数据库hack的技术。
对于前两个解决方案，我们需要一个使用正则表达式的编辑器，它允许递归（解决方案1）或自引用捕获组（解决方案2和3）。我想到了两个：Notepad ++和EditPad Pro。对于第三种解决方案，我们需要一个支持平衡组的编辑器。这可能会限制我们使用EditPad Pro或Visual Studio 2013 +。

输入文件：

我们假设我们正在搜索pig，并希望将其替换为行号。

我们将此作为输入：

my cat
dog
my pig
my cow
my mouse

:1:2:3:4:5:6:7

第一个解决方案：递归

支持的语言：除了上面提到的文本编辑器（Notepad ++和EditPad Pro）之外，这个解决方案应该在使用PCRE（PHP，R，Delphi），Perl和使用Matthew Barnett的Python的语言中工作。 regex模块（未经测试）。

递归结构存在于前瞻中，并且是可选的。它的工作是平衡左边不包含pig的行，右边有数字：把它想象为平衡像{{{ }}}这样的嵌套构造......除此之外左边我们有不匹配的线，右边我们有数字。关键是当我们退出前瞻时，我们知道跳过了多少行。

搜索：

(?sm)(?=.*?pig)(?=((?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?:(?1)|[^:]+)(:\d+))?).*?\Kpig(?=.*?(?(2)\2):(\d+))

带评论的免费间距版本

(?xsm) # free-spacing mode, multi-line (?=.*?pig) # fail right away if pig isn't there (?= # The Recursive Structure Lives In This Lookahead ( # Group 1 (?: # skip one line ^ (?:(?!pig)[^\r\n])* # zero or more chars not followed by pig (?:\r?\n) # newline chars ) (?:(?1)|[^:]+) # recurse Group 1 OR match all chars that are not a : (:\d+) # match digits )? # End Group ) # End lookahead. .*?\Kpig # get to pig (?=.*?(?(2)\2):(\d+)) # Lookahead: capture the next digits

替换： \3

在the demo中，请参阅底部的替换。您可以使用前两行中的字母（删除空格以使pig）将第一次出现的pig移动到另一行，并查看它对结果的影响。

第二种解决方案：指自己的群体（＆＃34; Qtax Trick＆＃34;）

支持的语言：除了上面提到的文本编辑器（Notepad ++和EditPad Pro）之外，这个解决方案应该在使用PCRE（PHP，R，Delphi），Perl和使用Matthew Barnett的Python的语言中工作。 regex模块（未经测试）。通过将\K转换为前瞻和占有量词转换为原子组，该解决方案很容易适应.NET（请参阅下面几行的.NET版本。）

搜索：

(?sm)(?=.*?pig)(?:(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*+.*?\Kpig(?=[^:]+(?(1)\1):(\d+))

.NET版本：回到未来

.NET没有\K。它的位置，我们使用＆＃34;回到未来＆＃34; lookbehind（包含在比赛前跳过的前瞻的后视）。此外，我们需要使用原子组而不是占有量词。

(?sm)(?<=(?=.*?pig)(?=(?>(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*).*)pig(?=[^:]+(?(1)\1):(\d+))

带注释的自由间距版本（Perl / PCRE版本）：

(?xsm) # free-spacing mode, multi-line (?=.*?pig) # lookahead: if pig is not there, fail right away to save the effort (?: # start counter-line-skipper (lines that don't include pig) (?: # skip one line ^ # (?:(?!pig)[^\r\n])* # zero or more chars not followed by pig (?:\r?\n) # newline chars ) # for each line skipped, let Group 1 match an ever increasing portion of the numbers string at the bottom (?= # lookahead [^:]+ # skip all chars that are not colons ( # start Group 1 (?(1)\1) # match Group 1 if set :\d+ # match a colon and some digits ) # end Group 1 ) # end lookahead )*+ # end counter-line-skipper: zero or more times .*? # match \K # drop everything we've matched so far pig # match pig (this is the match!) (?=[^:]+(?(1)\1):(\d+)) # capture the next number to Group 2

替换：

\2

输出

my cat dog my 3 my cow my mouse :1:2:3:4:5:6:7

在the demo中，请参阅底部的替换。您可以使用前两行中的字母（删除空格以使pig）将第一次出现的pig移动到另一行，并查看它对结果的影响。

选择数字分隔符

在我们的示例中，数字字符串的分隔符:相当常见，可能发生在其他地方。我们可以发明一个UNIQUE_DELIMITER并略微调整表达式。但是，以下优化更有效，让我们保留:

第二解决方案的优化：数字的反向字符串

不是按顺序粘贴我们的数字，而是以相反的顺序使用它们对我们有利：:7:6:5:4:3:2:1

在我们的前瞻中，这允许我们使用简单的.*深入到输入的底部，并从那里开始回溯。由于我们知道我们已经在字符串的末尾，因此我们不必担心:digits是字符串另一部分的一部分。这是怎么做的。

<强>输入：

my cat pi g dog p ig my pig my cow my mouse :7:6:5:4:3:2:1

搜索：

(?xsm) # free-spacing mode, multi-line (?=.*?pig) # lookahead: if pig is not there, fail right away to save the effort (?: # start counter-line-skipper (lines that don't include pig) (?: # skip one line that doesn't have pig ^ # (?:(?!pig)[^\r\n])* # zero or more chars not followed by pig (?:\r?\n) # newline chars ) # Group 1 matches increasing portion of the numbers string at the bottom (?= # lookahead .* # get to the end of the input ( # start Group 1 :\d+ # match a colon and some digits (?(1)\1) # match Group 1 if set ) # end Group 1 ) # end lookahead )*+ # end counter-line-skipper: zero or more times .*? # match \K # drop match so far pig # match pig (this is the match!) (?=.*(\d+)(?(1)\1)) # capture the next number to Group 2

替换： \2

请参阅the demo中的替换。

第三种解决方案：平衡群组

此解决方案特定于.NET。

搜索：

(?m)(?<=\A(?<c>^(?:(?!pig)[^\r\n])*(?:\r?\n))*.*?)pig(?=[^:]+(?(c)(?<-c>:\d+)*):(\d+))

带评论的免费间距版本

(?xm) # free-spacing, multi-line (?<= # lookbehind \A # (?<c> # skip one line that doesn't have pig # The length of Group c Captures will serve as a counter ^ # beginning of line (?:(?!pig)[^\r\n])* # zero or more chars not followed by pig (?:\r?\n) # newline chars ) # end skipper * # repeat skipper .*? # we're on the pig line: lazily match chars before pig ) # end lookbehind pig # match pig: this is the match (?= # lookahead [^:]+ # get to the digits (?(c) # if Group c has been set (?<-c>:\d+) # decrement c while we match a group of digits * # repeat: this will only repeat as long as the length of Group c captures > 0 ) # end if Group c has been set :(\d+) # Match the next digit group, capture the digits ) # end lokahead

替换： $1

参考

Qtax trick

On Which Line Number Was the Regex Match Found?

Answer 2

因为您没有指定哪个文本编辑器，所以在vim中它将是：

:%s/searched_word/\=printf('%-4d', line('.'))/g (read more)

但有人提到它并不是SO的问题，而是超级用户;）

Answer 3

我不知道编辑器是否能够扩展允许任意扩展的编辑器。

但是，您可以轻松使用perl来执行此任务。

perl -i.bak -e"s/word/$./eg" file

或者如果你想使用通配符，

perl -MFile::DosGlob=glob -i.bak -e"BEGIN { @ARGV = map glob($_), @ARGV } s/word/$./eg" *.txt

正则表达式可以返回找到匹配项的行号吗？

3 个答案:

递归，自引用组（Qtax技巧），反向Qtax或平衡组

第一个解决方案：递归

第二种解决方案：指自己的群体（＆＃34; Qtax Trick＆＃34;）

第二解决方案的优化：数字的反向字符串

第三种解决方案：平衡群组

参考