实施例

Question

我有一个包含多个项目（文本块）的文本文件，如下所示：

SAMPLE
ITEM_ID sample_id_0000028
blah blah
ABCD <--- do NOT remove
blah blah blah
blah blah
blah
SAMPLE_END


SAMPLE
ITEM_ID sample_id_0000033
other text
more text
ABCD <--- Remove this
more text
SAMPLE_END

SAMPLE
ITEM_ID sample_id_00041
ABCD <--- do NOT remove
blah blah blah
blah
SAMPLE_END

我想替换/删除标识为ABCD的项目中出现的sample_id_0000033实例。挑战在于文件中有ABCD的其他实例，我想单独留下。此外，ITEM_ID和ABCD之间的行数因项目而异，并且可能在指定的项目中找不到ABCD。

我必须在VBA中通过vbscript操作文件。我想我会用Regex来做这件事，但是VBA不支持带有lookbehind的正则表达式。是否有一种模式可用于实现这一目标，具有负面前瞻或比这更简单的东西？

我会在定义为textfile.ReadAll的字符串上执行正则表达式，其中textfile是TextStream。

Answer 1

您可以使用：

pattern: (ITEM_ID sample_id_0000033\D(?:[^S]|S(?!=AMPLE_END))+?)ABCD
replace: $1

或更好，这个：

pattern: (ITEM_ID sample_id_0000033\D(?:[^\r]+\r\n)+?)ABCD
replace: $1

或更短，如acheong87例：

pattern: (sample_id_0000033\D(?:[^\r]+\r\n)+?)ABCD
replace: $1

Answer 2

您需要一些分隔每个“块”的方法，例如，通过每个块之间的空行。例如，您可以替换

(sample_id_0000033(?:\r|\n|\r\n)(?:.*\S.*(?:\r|\n|\r\n))*)ABCD

与

$1

这是发生了什么。

sample_id_0000033不言自明。
我写(?:\r|\n|\r\n)作为“任何类型的换行符”的简写，无论是CR（Mac），LF（UNIX）还是CR / LF（DOS）。简写为(?:\r|\r?\n)。我不写[\r\n]+或\s+之类的原因是我们不想要匹配单个换行符。
然后，我们要跳过包含至少一个非空白字符的行，即非空行：.*\S.*。当然，紧接着是任何类型的线路，紧随其后。请注意，默认情况下，通配符. 不匹配换行符 - 如果您处于dot-matches-newlines模式，则应使用[^\r\n]而不是{{1} }}。
非捕获组.是可选的，但这是一种很好的做法，因为我们不打算使用这些组。
如果我们最终遇到(?: ... )的一行，那么之前的所有内容都会在ABCD中被捕获，并通过替换原样恢复，而不会$1。如果我们在遇到空行之前不遇到ABCD的行，则匹配失败并且不会替换任何内容。

Answer 3

考虑以下PowerShell通用正则表达式和逻辑的示例。这不会使用任何正则表达式的外观，并且会在任何blah blah行上匹配ABCD。

您应该能够将此概念重写为VBA逻辑。

实施例

$Matches = @()
$String = 'SAMPLE
ITEM_ID sample_id_0000028
blah blah
ABCD <--- do NOT remove
blah blah blah
blah blah
blah
SAMPLE_END


SAMPLE
ITEM_ID sample_id_0000033
other text
more text
ABCD <--- Remove this
more text
SAMPLE_END

SAMPLE
ITEM_ID sample_id_00041
ABCD <--- do NOT remove
blah blah blah
blah
SAMPLE_END

SAMPLE
ITEM_ID sample_id_0000028
blah blah
ABCD <--- do NOT remove
blah blah blah
blah blah
blah
SAMPLE_END
SAMPLE
ITEM_ID sample_id_0000033
other text
more text
ABCD <--- Remove this
more text
SAMPLE_END
SAMPLE
ITEM_ID sample_id_00041
ABCD <--- do NOT remove
blah blah blah
blah
SAMPLE_END'


 $NewString = $String
([regex]'(sample_id_0000033((.|\n|\r)*?)SAMPLE_END)').matches($String) | foreach {
    write-host  --------------------------------------------
    Write-Host "found at $($_.Groups[1].Index) = '$($_.Groups[1].Value)'"
    Write-Host "found at $($_.Groups[2].Index) = '$($_.Groups[2].Value)'"

    $ThisRecord = $_.Groups[1].Value

    $InnerText = $_.Groups[2].Value
    $NewInnerText = $InnerText -replace "ABCD", "I like kittens"

    $NewRecord = $ThisRecord -replace $InnerText, $NewInnerText

    write-host
    Write-Host NewRecord:
    Write-Host $NewRecord

    $NewString = $NewString -replace $ThisRecord, $NewRecord


    } # next match

产量

请注意，在此示例中，我在字符串上留下了<--- Remove this值，以便更容易识别更改的位置

--------------------------------------------
found at 136 = 'sample_id_0000033
other text
more text
ABCD <--- Remove this
more text
SAMPLE_END'
found at 153 = '
other text
more text
ABCD <--- Remove this
more text
'

NewRecord:
sample_id_0000033
other text
more text
I like kittens <--- Remove this
more text
SAMPLE_END
--------------------------------------------
found at 452 = 'sample_id_0000033
other text
more text
ABCD <--- Remove this
more text
SAMPLE_END'
found at 469 = '
other text
more text
ABCD <--- Remove this
more text
'

NewRecord:
sample_id_0000033
other text
more text
I like kittens <--- Remove this
more text
SAMPLE_END
--------------------------------------------
New String
SAMPLE
ITEM_ID sample_id_0000028
blah blah
ABCD <--- do NOT remove
blah blah blah
blah blah
blah
SAMPLE_END


SAMPLE
ITEM_ID sample_id_0000033
other text
more text
I like kittens <--- Remove this
more text
SAMPLE_END

SAMPLE
ITEM_ID sample_id_00041
ABCD <--- do NOT remove
blah blah blah
blah
SAMPLE_END

SAMPLE
ITEM_ID sample_id_0000028
blah blah
ABCD <--- do NOT remove
blah blah blah
blah blah
blah
SAMPLE_END
SAMPLE
ITEM_ID sample_id_0000033
other text
more text
I like kittens <--- Remove this
more text
SAMPLE_END
SAMPLE
ITEM_ID sample_id_00041
ABCD <--- do NOT remove
blah blah blah
blah
SAMPLE_END

摘要

使用此正则表达式(sample_id_0000033((.|\n|\r)*?)SAMPLE_END)查找以sample_id_0000033开头并以下一个SAMPLE_END结尾的所有文本块。当然，如果您使用不同的分隔符进行记录结束，则还需要在此处包含该分隔符。
幕后花絮Powershell隐藏了如何使用所有找到的子字符串填充$Matches数组。然后将这些传递到foreach循环，其中$_等同于$ Matches（在本例中）。
在foreach块内，我们会处理您找到的匹配实例：
- 使用所需的字符串ABCD替换已知文本I like kittens，并将生成的更改存储到$NewInnerText。我在这里创建了一个新变量，因为$InnerText不会包含打开和关闭字符串，这取决于ABCD的实际值可能会意外地更改结束标记中的文本。
- $NewRecord是根据$InnerText内$NewInnerText替换$ThisRecord的结果创建的。{/ 1}
- 使用$NewString然后我们使用$ThisRecord

正则表达式替换文本而不使用lookbehind

3 个答案:

实施例

产量

摘要