正则表达式Microsoft Word没有破坏文档格式

时间:2017-01-18 14:41:01

标签: regex vba ms-word word-vba

众所周知,单词的查找和替换“通配符”功能会受到一些严重的限制。

以下代码在word文档中实现了真正的正则表达式查找和替换,并且在其他Stackoverflow和SuperUser问题中找到了它的变体。

Sub RegEx_PlainText(Before As String, After As String)

    Dim regexp As Object
    Set regexp = CreateObject("vbscript.regexp")            

    With regexp
        .Pattern = Before
        .IgnoreCase = True
        .Global = True

         'could be any Range , .Range.Text , or selection object
         ActiveDocument.Range = .Replace(ActiveDocument.Range, After)

    End With
End Sub

然而,这会擦除所有格式的文档。

即使字符串长度相同或字符串相同,Word也不会保留字符格式,因此ActiveDocument.Range = ActiveDocument.RangeSelection.Text=Selection.Text会擦除所有格式(或者更准确地说,格式化整个格式)范围与范围中的第一个字符相同,并添加回车符。经过反思,这种行为并不令人惊讶。

要解决此问题,以下代码会运行正则表达式查找,然后遍历匹配项并仅在找到匹配项的范围内运行.replace这样,只会丢失格式化,如果匹配自己有各种格式(例如斜体字将丢失)

希望代码注释能够使它非常透明。

Sub RegEx(Before As String, After As String, _
          Optional CaseSensitive As Boolean = False, _
          Optional Location As Range = Nothing, _
          Optional DebugMode As Boolean = False)

    'can't declare activedocument.range in parameters
    If Location Is Nothing Then Set Location = ActiveDocument.Range

    Dim regexp As Object
    Dim Foundmatches As Object
    Dim Match As Object
    Dim MatchRange As Range
    Dim offset As Integer: offset = 0
    Set regexp = CreateObject("vbscript.regexp")

   With regexp
        .Pattern = Before
        .IgnoreCase = Not CaseSensitive
        .Global = True

        'set foundmatches to collection of all regex matches
        Set Foundmatches = .Execute(Location.text)

        For Each Match In Foundmatches

            'set matchrange to location of found string in source doc.
            'offset accounts for change in length of  document from already completed replacements
            Set MatchRange = Location.Document _
                   .Range(Match.FirstIndex + offset, _
                          Match.FirstIndex + Match.Length + offset)

            'debugging
            If DebugMode Then
                    Debug.Print "strfound      = " & Match.Value
                    Debug.Print "matchpoint    = " & Match.FirstIndex
                    Debug.Print "origstrlength = " & Match.Length
                    Debug.Print "offset        = " & offset
                    Debug.Print "matchrange    = " & MatchRange.text
                    MatchRange.Select
                Stop

            Else
            'REAL LIFE
                'run the regex replace just on the range containing the regex match
                MatchRange = .Replace(MatchRange, After)

                'increment offset to account for change in length of document
                offset = offset + MatchRange.End - MatchRange.Start - Match.Length
            End If
        Next
    End With
End Sub

这适用于简单文档,但是当我在真实文档上运行时,matchrange最终会在找到匹配的位置附近,但不完全正确。它不可预测地关闭,有时它是向右,有时是向左。通常文档越复杂。 (链接,上下文表格,格式化等)最终会出错。

有谁知道为什么这不起作用,以及如何修复它?如果我能理解为什么这不起作用,那么我可能能够确定这种方法是否可以修复,或者如果我只是需要尝试不同的方法。

代码包含DebugMode参数,这意味着它将循环遍历文档并突出显示所有匹配项,不执行任何更改。还向控制台输出一堆内容。这应该对任何有足够的人来帮助我解决这个问题。

https://calibre-ebook.com/downloads/demos/demo.docx这是一份可能有用的示例文档(不是我制作的)。

1 个答案:

答案 0 :(得分:3)

@Some_Guy:感谢您提出这个问题,我遇到了类似的问题,您的帖子为我节省了相当多的时间。

这是我想出来的kludge:

Sub RegEx(Before As String, After As String, _
          Optional CaseSensitive As Boolean = False, _
          Optional Location As Range = Nothing, _
          Optional DebugMode As Boolean = False)

    'can't declare activedocument.range in parameters
    If Location Is Nothing Then Set Location = ActiveDocument.Range

    Dim j As Long
    Dim regexp As Object
    Dim Foundmatches As Object
    Dim Match As Object
    Dim MatchRange As Range
    Dim offset As Integer: offset = 0
    Set regexp = CreateObject("vbscript.regexp")

    With regexp
        .Pattern = Before
        .IgnoreCase = Not CaseSensitive
        .Global = True

        'set foundmatches to collection of all regex matches
        Set Foundmatches = .Execute(Location.Text)
        For j = Foundmatches.Count - 1 To 0 Step -1

            If DebugMode = True Then
                'debugging
                Debug.Print Foundmatches(j), .Replace(Foundmatches(j), After)
            Else
                'REAL LIFE

                'run a plain old find/replace on the found string and eplace strings
                With ActiveDocument.Range.Find
                    .ClearFormatting
                    .Replacement.ClearFormatting
                    .Replacement.Font.Hidden = True
                    .Text = Foundmatches(j)
                    .Replacement.Text = regexp.Replace(Foundmatches(j), After)
                    .Execute Replace:=wdReplaceAll
                End With
            End If
        Next j
    End With
End Sub

基本上我使用一个简单的查找/替换字符串,匹配每个找到(并将被替换)的项目与正则表达式,将在Word中存在相当的支持。请注意,替换的任何文本都采用第一个替换字符的格式,因此如果第一个单词以粗体显示,则所有替换的文本都将为粗体。