我正在尝试清除Word中所选文本的每一行,以便删除任何不需要的字符(制表符,多个空格,多个换行符,空行)。
我们说我有一个文字:
John Smith
John Anderson Smith
John A. Smith
J.A. Smith
我想实现以下目标:
John Smith
John Anderson Smith
John A. Smith
J.A. Smith
让我告诉你到目前为止我尝试了什么。删除标签非常简单:
With Selection
.Text = Replace(.Text, vbTab, "")
End With
新行字符也是如此:
With Selection
.Text = Replace(.Text, vbCrLf, ",")
End With
不幸的是,当我尝试使用.Text = Replace(.Text, vbCrLf & vbCrLf, "")
删除两个新的换行符时,它无效
我也尝试了以下代码,但它也无法正常工作。
With Selection
Do While InStr(1, .Text, vbCrLf & vbCrLf)
.Text = Replace(.Text, vbCrLf & vbCrLf, vbCrlF)
Loop
End With
我想出了另一段代码。我几乎完成了这个伎俩,即。删除制表符,前导空格和尾随空格以及两个以上的空格。但它也删除了所有新行字符,我想只删除重复的新行字符(即多个字符)。正则表达式\p{2,}
和\n
在这里不起作用。删除空行也很棒。
Sub test()
With Selection
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.IgnoreCase = True
RegEx.Pattern = "\s{2,}"
.Text = Trim(RegEx.Replace(.Text, " "))
End With
End Sub
答案 0 :(得分:1)
尝试
Sub CleanUpPastedText()
Application.ScreenUpdating = False
With Selection.Range.Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
'Eliminate spaces & tabs before paragraph breaks.
.Text = "[ ^s^t]{1,}^13"
.Replacement.Text = "^p"
.Execute Replace:=wdReplaceAll
'Replace single paragraph breaks with a space
.Text = "([!^13])([^13])([!^13])"
.Replacement.Text = "\1 \3"
'Replace all double spaces with single spaces
.Execute Replace:=wdReplaceAll
.Text = "[ ]{2,}"
.Replacement.Text = " "
'Delete hypens in hyphenated text formerly split across lines
.Execute Replace:=wdReplaceAll
.Text = "([a-z])-[ ]{1,}([a-z])"
.Replacement.Text = "\1\2"
.Execute Replace:=wdReplaceAll
'Limit paragraph breaks to one per 'real' paragraph.
.Text = "[^13]{1,}"
.Replacement.Text = "^p"
.Execute Replace:=wdReplaceAll
End With
'Restore Screen Updating
Application.ScreenUpdating = True
End Sub
修改强>
以上代码适用于2007年和2010年。
对于Office 16,您应使用^013
代替^13
答案 1 :(得分:0)
我终于设法清理了文字。让我们考虑以下文字:
John Smith
John Anderson Smith
John A. Smith
J.A. Smith
<below many new line character, some lines with spaces, tabs>
另一个答案中的代码:
John Smith
John Anderson Smith John A. Smith J.A. Smith
这显然不正确。这是我的代码:
With selection
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[ \t]+"
.Text = RegEx.Replace(.Text, " ")
RegEx.MultiLine = True
RegEx.Pattern = "^(?:[\t ]*(?:\r?\n|\r))+"
.Text = RegEx.Replace(.Text, "")
' the following is from http://stackoverflow.com/a/24049145/2657875
RegEx.Pattern = "^[\s\xA0]+|[\s\xA0]+$"
.Text = RegEx.Replace(.Text, "")
End With
代码产生以下预期结果。
Jim Kane
John Andy Lemar
J. A. Smith
Jane Smith