从多个空白字符

时间:2016-01-30 14:16:06

标签: vba ms-word word-vba

我正在尝试清除Word中所选文本的每一行,以便删除任何不需要的字符(制表符,多个空格,多个换行符,空行)。

我们说我有一个文字:

John        Smith 



John    Anderson Smith  

John           A. Smith     
J.A. Smith

我想实现以下目标:

John Smith 
John Anderson Smith 
John A. Smith 
J.A. Smith

让我告诉你到目前为止我尝试了什么。删除标签非常简单:

With Selection
    .Text = Replace(.Text, vbTab, "")
End With

新行字符也是如此:

With Selection
    .Text = Replace(.Text, vbCrLf, ",")
End With

不幸的是,当我尝试使用.Text = Replace(.Text, vbCrLf & vbCrLf, "")删除两个新的换行符时,它无效 我也尝试了以下代码,但它也无法正常工作。

With Selection
    Do While InStr(1, .Text, vbCrLf & vbCrLf)
        .Text = Replace(.Text, vbCrLf & vbCrLf, vbCrlF)
    Loop
End With

我想出了另一段代码。我几乎完成了这个伎俩,即。删除制表符,前导空格和尾随空格以及两个以上的空格。但它也删除了所有新行字符,我想只删除重复的新行字符(即多个字符)。正则表达式\p{2,}\n在这里不起作用。删除空行也很棒。

Sub test()
    With Selection
        Dim RegEx As Object
        Set RegEx = CreateObject("VBScript.RegExp")
        RegEx.Global = True
        RegEx.IgnoreCase = True
        RegEx.Pattern = "\s{2,}"
        .Text = Trim(RegEx.Replace(.Text, " "))
    End With
End Sub

2 个答案:

答案 0 :(得分:1)

尝试

Sub CleanUpPastedText()
    Application.ScreenUpdating = False
    With Selection.Range.Find
      .ClearFormatting
      .Replacement.ClearFormatting
      .Forward = True
      .Wrap = wdFindStop
      .Format = False
      .MatchAllWordForms = False
      .MatchSoundsLike = False
      .MatchWildcards = True
      'Eliminate spaces & tabs before paragraph breaks.
      .Text = "[ ^s^t]{1,}^13"
      .Replacement.Text = "^p"
      .Execute Replace:=wdReplaceAll
      'Replace single paragraph breaks with a space
      .Text = "([!^13])([^13])([!^13])"
      .Replacement.Text = "\1 \3"
      'Replace all double spaces with single spaces
      .Execute Replace:=wdReplaceAll
      .Text = "[ ]{2,}"
      .Replacement.Text = " "
      'Delete hypens in hyphenated text formerly split across lines
      .Execute Replace:=wdReplaceAll
      .Text = "([a-z])-[ ]{1,}([a-z])"
      .Replacement.Text = "\1\2"
      .Execute Replace:=wdReplaceAll
      'Limit paragraph breaks to one per 'real' paragraph.
      .Text = "[^13]{1,}"
      .Replacement.Text = "^p"
      .Execute Replace:=wdReplaceAll
    End With
    'Restore Screen Updating
    Application.ScreenUpdating = True
End Sub

修改

以上代码适用于2007年和2010年。

对于Office 16,您应使用^013代替^13

答案 1 :(得分:0)

我终于设法清理了文字。让我们考虑以下文字:

                John              Smith




John                    Anderson Smith
John      A. Smith
J.A. Smith






<below many new line character, some lines with spaces, tabs>

另一个答案中的代码:

      John Smith
John                    Anderson Smith John A. Smith J.A. Smith

这显然不正确。这是我的代码:

With selection
        Dim RegEx As Object
        Set RegEx = CreateObject("VBScript.RegExp")
        RegEx.Global = True
        RegEx.Pattern = "[ \t]+"
        .Text = RegEx.Replace(.Text, " ")
        RegEx.MultiLine = True
        RegEx.Pattern = "^(?:[\t ]*(?:\r?\n|\r))+"
        .Text = RegEx.Replace(.Text, "")
        ' the following is from http://stackoverflow.com/a/24049145/2657875
        RegEx.Pattern = "^[\s\xA0]+|[\s\xA0]+$"
        .Text = RegEx.Replace(.Text, "")
End With

代码产生以下预期结果。

Jim Kane
John Andy Lemar
J. A. Smith
Jane Smith