众所周知,单词的查找和替换“通配符”功能会受到一些严重的限制。
以下代码在word文档中实现了真正的正则表达式查找和替换,并且在其他Stackoverflow和SuperUser问题中找到了它的变体。
Sub RegEx_PlainText(Before As String, After As String)
Dim regexp As Object
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = Before
.IgnoreCase = True
.Global = True
'could be any Range , .Range.Text , or selection object
ActiveDocument.Range = .Replace(ActiveDocument.Range, After)
End With
End Sub
然而,这会擦除所有格式的文档。
即使字符串长度相同或字符串相同,Word也不会保留字符格式,因此ActiveDocument.Range = ActiveDocument.Range
或Selection.Text=Selection.Text
会擦除所有格式(或者更准确地说,格式化整个格式)范围与范围中的第一个字符相同,并添加回车符。经过反思,这种行为并不令人惊讶。
要解决此问题,以下代码会运行正则表达式查找,然后遍历匹配项并仅在找到匹配项的范围内运行.replace
。这样,只会丢失格式化,如果匹配自己有各种格式(例如斜体字将丢失)
希望代码注释能够使它非常透明。
Sub RegEx(Before As String, After As String, _
Optional CaseSensitive As Boolean = False, _
Optional Location As Range = Nothing, _
Optional DebugMode As Boolean = False)
'can't declare activedocument.range in parameters
If Location Is Nothing Then Set Location = ActiveDocument.Range
Dim regexp As Object
Dim Foundmatches As Object
Dim Match As Object
Dim MatchRange As Range
Dim offset As Integer: offset = 0
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = Before
.IgnoreCase = Not CaseSensitive
.Global = True
'set foundmatches to collection of all regex matches
Set Foundmatches = .Execute(Location.text)
For Each Match In Foundmatches
'set matchrange to location of found string in source doc.
'offset accounts for change in length of document from already completed replacements
Set MatchRange = Location.Document _
.Range(Match.FirstIndex + offset, _
Match.FirstIndex + Match.Length + offset)
'debugging
If DebugMode Then
Debug.Print "strfound = " & Match.Value
Debug.Print "matchpoint = " & Match.FirstIndex
Debug.Print "origstrlength = " & Match.Length
Debug.Print "offset = " & offset
Debug.Print "matchrange = " & MatchRange.text
MatchRange.Select
Stop
Else
'REAL LIFE
'run the regex replace just on the range containing the regex match
MatchRange = .Replace(MatchRange, After)
'increment offset to account for change in length of document
offset = offset + MatchRange.End - MatchRange.Start - Match.Length
End If
Next
End With
End Sub
这适用于简单文档,但是当我在真实文档上运行时,matchrange
最终会在找到匹配的位置附近,但不完全正确。它不可预测地关闭,有时它是向右,有时是向左。通常文档越复杂。 (链接,上下文表格,格式化等)最终会出错。
有谁知道为什么这不起作用,以及如何修复它?如果我能理解为什么这不起作用,那么我可能能够确定这种方法是否可以修复,或者如果我只是需要尝试不同的方法。
代码包含DebugMode参数,这意味着它将循环遍历文档并突出显示所有匹配项,不执行任何更改。还向控制台输出一堆内容。这应该对任何有足够的人来帮助我解决这个问题。
https://calibre-ebook.com/downloads/demos/demo.docx这是一份可能有用的示例文档(不是我制作的)。
答案 0 :(得分:3)
@Some_Guy:感谢您提出这个问题,我遇到了类似的问题,您的帖子为我节省了相当多的时间。
这是我想出来的kludge:
Sub RegEx(Before As String, After As String, _
Optional CaseSensitive As Boolean = False, _
Optional Location As Range = Nothing, _
Optional DebugMode As Boolean = False)
'can't declare activedocument.range in parameters
If Location Is Nothing Then Set Location = ActiveDocument.Range
Dim j As Long
Dim regexp As Object
Dim Foundmatches As Object
Dim Match As Object
Dim MatchRange As Range
Dim offset As Integer: offset = 0
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = Before
.IgnoreCase = Not CaseSensitive
.Global = True
'set foundmatches to collection of all regex matches
Set Foundmatches = .Execute(Location.Text)
For j = Foundmatches.Count - 1 To 0 Step -1
If DebugMode = True Then
'debugging
Debug.Print Foundmatches(j), .Replace(Foundmatches(j), After)
Else
'REAL LIFE
'run a plain old find/replace on the found string and eplace strings
With ActiveDocument.Range.Find
.ClearFormatting
.Replacement.ClearFormatting
.Replacement.Font.Hidden = True
.Text = Foundmatches(j)
.Replacement.Text = regexp.Replace(Foundmatches(j), After)
.Execute Replace:=wdReplaceAll
End With
End If
Next j
End With
End Sub
基本上我使用一个简单的查找/替换字符串,匹配每个找到(并将被替换)的项目与正则表达式,将在Word中存在相当的支持。请注意,替换的任何文本都采用第一个替换字符的格式,因此如果第一个单词以粗体显示,则所有替换的文本都将为粗体。