重复删除VBA Word无法有效工作

时间:2015-11-06 08:30:09

标签: vba ms-word word-vba

我有一个删除重复项的程序,一切正常。它只是冻结大数据集,即1到250万字。

我的做法有什么问题?还有更好的吗?

Sub DeleteDuplicateParagraphs()

Dim p1 As Paragraph
Dim p2 As Paragraph
Dim DupCount As Long
Dim StartTime As Double
Dim SecondsElapsed As Double

StartTime = Timer

For Each p1 In ActiveDocument.Paragraphs
  If p1.range.Text <> vbCr Then

    For Each p2 In ActiveDocument.Paragraphs
      If p1.range.Text = p2.range.Text Then
        DupCount = DupCount + 1
        If p1.range.Text = p2.range.Text And DupCount > 1 Then p2.range.Delete
      End If
    Next p2
  End If
  DupCount = 0
Next p1

SecondsElapsed = Round(Timer - StartTime, 2)

MsgBox "This code ran successfully in " & SecondsElapsed & " seconds", vbInformation
DupCount = 0

End Sub

2 个答案:

答案 0 :(得分:0)

尝试此操作(首先将Microsoft Scripting Runtime的引用添加到您的VBA项目中):

Sub DeleteDuplicateParagraphs()
  Dim p As Paragraph
  Dim d As New Scripting.Dictionary
  Dim t As Variant
  Dim i As Integer
  Dim StartTime As Single

  StartTime = Timer

  ' collect duplicates
  For Each p In ActiveDocument.Paragraphs
    t = p.Range.Text
    If t <> vbCr Then
      If Not d.Exists(t) Then d.Add t, New Scripting.Dictionary
      d(t).Add d(t).Count + 1, p
    End If
  Next

  ' eliminate duplicates
  Application.ScreenUpdating = False
  For Each t In d
    For i = 2 To d(t).Count
      d(t)(i).Range.Delete
    Next
  Next
  Application.ScreenUpdating = True

  MsgBox "This code ran successfully in " & Round(Timer - StartTime, 2) & " seconds", vbInformation
End Sub

这利用了Scripting.Dictionary是一个哈希表的事实,该哈希表适用于非常快速地将唯一键与值相关联。因此,它非常擅长发现重复的密钥。字典键必须是字符串,方便我们使用段落文本。

对于值,我们使用更多的字典对象,仅仅因为它们比VBA的数组工作得更好。在其中,我们使用相同的文本收集对实际段落实例的引用。

事实上,删除重复的段落是一件非常简单的事情。

注意:上面代码中的重复检测部分非常快。但是,如果Word在大型文档中没有响应,那么它就在重复删除部分中,即因为Word的撤消缓冲区。

罪魁祸首是段落范围一个接一个地删除,导致Word构建一个非常大的撤消缓冲区。不幸的是,(我知道)无论如何都

  • 在一个步骤中删除多个单独的范围(这将导致撤消缓冲区中只有一个条目),或
  • 从VBA
  • 完全禁用撤消缓冲区

定期在&#34中调用UndoClear;消除重复&#34;循环可能会有所帮助,禁用ScreenUpdating也不是一个坏主意:

' eliminate duplicates
Dim x As Integer

Application.ScreenUpdating = False
For Each t In d
  x = x + 1
  For i = 2 To d(t).Count
    d(t)(i).Range.Delete
  Next
  If x Mod 50 = 0 Then ActiveDocument.UndoClear
Next
ActiveDocument.UndoClear
Application.ScreenUpdating = True

答案 1 :(得分:0)

首先,我非常感谢你花时间和精力帮助我。

你对这种方法背后的想法非常令人印象深刻。我确实略微更改了代码,并希望您在有空的时候仔细阅读它,看它是否符合最佳标准。再一次,我真的非常感谢你,代码比之前的代码运行了20次分割,甚至没有更大的数据集。

   >  Sub DeleteDuplicateParagraphs()
    > 
    >   Dim p As Paragraph 

>     Set d = CreateObject("Scripting.Dictionary")  
>         Dim t As Variant   
>     Dim i As Integer   
>     Dim StartTime As Single
    > 
    >   StartTime = Timer
    > 
    >   ' collect duplicates   For Each p In ActiveDocument.Paragraphs
    >     t = p.range.Text
    >     If t <> vbCr Then
    >       If Not d.Exists(t) Then d.Add t, CreateObject("Scripting.Dictionary")
    >       d(t).Add d(t).Count + 1, p
    >     End If   Next
    > 
    >   ' eliminate duplicates   For Each t In d
    >     For i = 2 To d(t).Count
    >       d(t)(i).range.Delete
    >     Next   Next
    > 
    >   MsgBox "This code ran successfully in " & Round(Timer - StartTime,
    > 2) & " seconds", vbInformation
    > 
    > End Sub