我有一个删除重复项的程序,一切正常。它只是冻结大数据集,即1到250万字。
我的做法有什么问题?还有更好的吗?
Sub DeleteDuplicateParagraphs()
Dim p1 As Paragraph
Dim p2 As Paragraph
Dim DupCount As Long
Dim StartTime As Double
Dim SecondsElapsed As Double
StartTime = Timer
For Each p1 In ActiveDocument.Paragraphs
If p1.range.Text <> vbCr Then
For Each p2 In ActiveDocument.Paragraphs
If p1.range.Text = p2.range.Text Then
DupCount = DupCount + 1
If p1.range.Text = p2.range.Text And DupCount > 1 Then p2.range.Delete
End If
Next p2
End If
DupCount = 0
Next p1
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "This code ran successfully in " & SecondsElapsed & " seconds", vbInformation
DupCount = 0
End Sub
答案 0 :(得分:0)
尝试此操作(首先将Microsoft Scripting Runtime的引用添加到您的VBA项目中):
Sub DeleteDuplicateParagraphs()
Dim p As Paragraph
Dim d As New Scripting.Dictionary
Dim t As Variant
Dim i As Integer
Dim StartTime As Single
StartTime = Timer
' collect duplicates
For Each p In ActiveDocument.Paragraphs
t = p.Range.Text
If t <> vbCr Then
If Not d.Exists(t) Then d.Add t, New Scripting.Dictionary
d(t).Add d(t).Count + 1, p
End If
Next
' eliminate duplicates
Application.ScreenUpdating = False
For Each t In d
For i = 2 To d(t).Count
d(t)(i).Range.Delete
Next
Next
Application.ScreenUpdating = True
MsgBox "This code ran successfully in " & Round(Timer - StartTime, 2) & " seconds", vbInformation
End Sub
这利用了Scripting.Dictionary
是一个哈希表的事实,该哈希表适用于非常快速地将唯一键与值相关联。因此,它非常擅长发现重复的密钥。字典键必须是字符串,方便我们使用段落文本。
对于值,我们使用更多的字典对象,仅仅因为它们比VBA的数组工作得更好。在其中,我们使用相同的文本收集对实际段落实例的引用。
事实上,删除重复的段落是一件非常简单的事情。
注意:上面代码中的重复检测部分非常快。但是,如果Word在大型文档中没有响应,那么它就在重复删除部分中,即因为Word的撤消缓冲区。
罪魁祸首是段落范围一个接一个地删除,导致Word构建一个非常大的撤消缓冲区。不幸的是,(我知道)无论如何都
定期在&#34中调用UndoClear
;消除重复&#34;循环可能会有所帮助,禁用ScreenUpdating
也不是一个坏主意:
' eliminate duplicates
Dim x As Integer
Application.ScreenUpdating = False
For Each t In d
x = x + 1
For i = 2 To d(t).Count
d(t)(i).Range.Delete
Next
If x Mod 50 = 0 Then ActiveDocument.UndoClear
Next
ActiveDocument.UndoClear
Application.ScreenUpdating = True
答案 1 :(得分:0)
首先,我非常感谢你花时间和精力帮助我。
你对这种方法背后的想法非常令人印象深刻。我确实略微更改了代码,并希望您在有空的时候仔细阅读它,看它是否符合最佳标准。再一次,我真的非常感谢你,代码比之前的代码运行了20次分割,甚至没有更大的数据集。
> Sub DeleteDuplicateParagraphs()
>
> Dim p As Paragraph
> Set d = CreateObject("Scripting.Dictionary")
> Dim t As Variant
> Dim i As Integer
> Dim StartTime As Single
>
> StartTime = Timer
>
> ' collect duplicates For Each p In ActiveDocument.Paragraphs
> t = p.range.Text
> If t <> vbCr Then
> If Not d.Exists(t) Then d.Add t, CreateObject("Scripting.Dictionary")
> d(t).Add d(t).Count + 1, p
> End If Next
>
> ' eliminate duplicates For Each t In d
> For i = 2 To d(t).Count
> d(t)(i).range.Delete
> Next Next
>
> MsgBox "This code ran successfully in " & Round(Timer - StartTime,
> 2) & " seconds", vbInformation
>
> End Sub