Question

我需要一个与段落匹配的正则表达式：'＆amp;开始'（首先在示例文本中），直到'并结束'（最后一个来自示范文本）。问题是有时'＆amp; end a'没有明确指定，有时写成'＆amp; end'。当你有'＆amp; Start b'和'＆amp; end b'（有时也是'＆amp; end'，因此混乱）时，问题就更大了。

此正则表达式的目标示例块是（抱歉将其作为代码块）：

junk text

&Start a <

fulfilling text

fulfilling text

&Start b

&Start c

&end c

fulfilling text

&end

&end <

junk text

因此正则表达式应匹配所有段落的开头和结尾与包含＆lt;的行。符号，虽然它不包含在原始文本中。（即使用我们想要的＆amp; Start X，并跳过'＆amp; Start Y''和'end'（或'＆amp; end Y'）组直到'＆amp; end'（或'＆amp; end X''我们想要。

这不是一个简单的实现。我正在使用的表达式如下：

&start a([^&]*)(&end a|&end)

哪个匹配良好隔离'并且启动''和'结束'段落，但当其他'＆amp; start Y'行介于两者之间时，脚本会变得混乱。我可能会使用一些If跳过不需要的块的if语句......这是一个更复杂的案例：

junk text

&Start a <

fulfilling text

fulfilling text

&Start b

&Start c

&end

fulfilling text

&end

&end <

junk text

不指定任何'＆amp; end'。注1：'＆amp; start X'始终定义，但'＆amp; end X'也可以'＆amp; end'，但始终对应于最接近的开始。注2：由于堆栈溢出错误，我无法改变我的正则表达式的结构，而是适应这种特定情况。

对于奇怪的解释感到抱歉，但我希望有人可以找到任何可行的建议。

谢谢

编辑：

#@ -split "`n" | ForEach-Object { $_.trim() } |

$files = Get-ChildItem "$PSScriptRoot" # root path

for($i=0; $i -lt $files.Count; $i++){

    #iterate through files from the current folder.
    $data = Get-Content -Path $files[$i].FullName

    # parse DisabledFeatures.txt file as array of strings (1 string per line of the file)
    $feature = Get-Content DisabledFeatures.txt

    #iterate for each string entry in $feature array (read from txt file)
    for($counter=0; counter -lt $feature.Count; counter++){

        #retrieve array value to use it in the main algorythm
        $groupID = "$feature"

        $data | ForEach-Object -Begin { $ignore = $false; $levels = 0 } -Process {
            #Start ignoring text after we've found the trigger
            if($_ -match "^`#ifdef $groupID") { $ignore = $true }   
            #Track nested groups
            elseif($ignore) {
                if ($_ -match '^`#ifdef') { $levels++ }
                elseif ($_ -match '`#endif') {
                    if($levels -ge 1) { $levels-- }
                    #If no nesting, we've hit the end of our targeted group. Stop ignoring
                    else { $ignore = $false }
                }
            }
            #Write line
            else { $_ }
        }  
    }
}

Answer 1

纯正的正则表达式解决方案可能不是解决此问题的最佳解决方案。它可能已经完成，但它可能非常复杂且难以理解。我会使用一个简单的解析器。例如：

function Remove-TextGroup {
    param(
        [Parameter(Mandatory=$true)]
        [string[]]$Data,
        [Parameter(Mandatory=$true)]
        [string]$GroupID
    )

    $Data | ForEach-Object -Begin { $ignore = $false; $levels = 0 } -Process {
        #Start ignoring text after we've found the trigger
        if($_ -match "^&start $GroupID") { $ignore = $true }   
        #Track nested groups
        elseif($ignore) {
            if ($_ -match '^&start') { $levels++ }
            elseif ($_ -match '^&end') {
                if($levels -ge 1) { $levels-- }
                #If no nesting, we've hit the end of our targeted group. Stop ignoring
                else { $ignore = $false }
            }
        }
        #Write line
        else { $_ }

    }
}

用法：

$data = @"
junk text

&Start a <

fulfilling text

fulfilling text

&Start b

&Start c

&end

fulfilling text

&end

&end <

junk text
"@ -split "`n" | ForEach-Object { $_.trim() } |
#Remove empty lines
Where-Object { $_ }

Remove-TextGroup -Data $data -GroupID a    

#Or to read from file.. 
#$data = Get-Content -Path Myfile.txt
Remove-TextGroup -Data $data -GroupID a

输出：

junk text
junk text

如果文件很大，我会优化上面的示例，使用streamreader来读取文件。

段落条件嵌套正则表达式（递归）

1 个答案: