我有一个宏,它通过word文档的段落。此代码旨在传递段落,标识其大纲级别并在找到所需的段落大纲级别时检索内容。有了这些信息,我正在填充一个列表框,允许用户选择从哪个点导出文档中的某些文本。
此功能正在运行,但是,我正在寻找一种提高速度的方法。现在我正在处理一个包含5678段的文档,处理所有信息需要30多分钟。你有什么建议吗?
我试图接近而没有取得成功:
1 - 我尝试使用对象TableOfContents,但是我无法获得干净的信息并将大纲级别与段落区分开来。
2 - 我试图调整此处Getting the headings from a Word document的代码,特别是因为使用了命令_docSource.GetCrossReferenceItems(wdRefTypeHeading),也没有成功
这里有表单的图像和我正在使用的代码。
Sub ProcessHeaders()
Dim j As Long
Dim Paragraph_Number() As Variant
Dim Paragraph_Content() As Variant
Dim Paragraph_Mapping() As Variant
j = 1
With UserForm1
If .ComboBox4.ListCount > 0 Then
.ComboBox4.Clear
End If
For i = 1 To wordDoc.Paragraphs.Count
If wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel1 _
Or wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel2 _
Or wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel3 _
Or wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel4 Then
If wordDoc.Paragraphs.Item(i).Range.ListFormat.ListString <> "" Then
ReDim Preserve Paragraph_Number(j)
ReDim Preserve Paragraph_Content(j)
Paragraph_Content(j) = wordDoc.Paragraphs.Item(i).Range.ListFormat.ListString & " " & Trim(Left(wordDoc.Paragraphs.Item(i).Range.Text, (Len(wordDoc.Paragraphs.Item(i).Range.Text) - 1)))
Paragraph_Number(j) = i
j = j + 1
End If
End If
Next i
ReDim Preserve Paragraph_Mapping(1 To UBound(Paragraph_Content), 1)
For i = 1 To UBound(Paragraph_Number)
Paragraph_Mapping(i, 0) = Paragraph_Content(i)
Paragraph_Mapping(i, 1) = Paragraph_Number(i)
Next i
.ComboBox4.List = Paragraph_Mapping
End With
End Sub
编辑1 - 我使用下面的代码实现将执行时间从32分钟缩短到8分钟。有什么建议可以改善吗?提前致谢
Sub ProcessHeaders()
Dim j As Long
Dim thisOutlineLevel As WdOutlineLevel
Dim thisHeader As String
Dim thisList As String
Dim ParagraphCount As Long
Dim Paragraph_Number_Base() As Variant
Dim Paragraph_Content_Base() As Variant
Dim Paragraph_ListItem_Base() As Variant
Dim ParagraphContent() As Variant
Dim ParagraphNumber() As Variant
Dim Paragraph_Mapping() As Variant
Dim StartTime As Double
Dim MinutesElapsed As String
j = 1
With UserForm1
If .ComboBox4.ListCount > 0 Then
.ComboBox4.Clear
End If
ParagraphCount = wordDoc.Paragraphs.Count
ReDim Paragraph_Content_Base(ParagraphCount + 1)
ReDim Paragraph_ListItem_Base(ParagraphCount + 1)
ReDim Paragraph_Number_Base(ParagraphCount + 1)
StartTime = Timer
For i = 1 To ParagraphCount
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
UserForm1.Label7.Caption = "Reading Paragraphs. " & Format(i / ParagraphCount, "0%") & " | Total of Paragraphs Found: " & ParagraphCount & " | Time Elapsed: " _
& MinutesElapsed & " Minutes"
With wordDoc.Paragraphs.Item(i)
Select Case .OutlineLevel
Case wdOutlineLevelBodyText
GoTo ResumeNext
Case wdOutlineLevel1, wdOutlineLevel2, wdOutlineLevel3, wdOutlineLevel4
Paragraph_Content_Base(i) = .Range.Text
Paragraph_ListItem_Base(i) = .Range.ListFormat.ListString
Paragraph_Number_Base(i) = i
End Select
End With
ResumeNext:
Next i
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
UserForm1.Label7.Caption = ParagraphCount & " read on " & MinutesElapsed & " Minutes. Now, identifying the Headers"
For i = 0 To UBound(Paragraph_Content_Base)
If Paragraph_Content_Base(i) <> "" And Paragraph_ListItem_Base(i) <> "" Then
ReDim Preserve ParagraphContent(j)
ReDim Preserve ParagraphNumber(j)
ParagraphContent(j) = Trim(Paragraph_ListItem_Base(i)) & " " & Trim(Left(Paragraph_Content_Base(i), Len(Paragraph_Content_Base(i)) - 1))
ParagraphNumber(j) = Paragraph_Number_Base(i)
j = j + 1
End If
Next i
Erase Paragraph_Content_Base
Erase Paragraph_ListItem_Base
Erase Paragraph_Number_Base
ReDim Preserve Paragraph_Mapping(1 To UBound(ParagraphContent), 1)
For i = 1 To UBound(ParagraphNumber)
Paragraph_Mapping(i, 0) = ParagraphContent(i)
Paragraph_Mapping(i, 1) = ParagraphNumber(i)
Next i
.ComboBox4.List = Paragraph_Mapping
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
UserForm1.Label7.Caption = "Identifying Headers: " & j & " identified. Total Time: " & MinutesElapsed & " minutes"
End With
编辑2 - 在Cindy的帮助下,最初在32分钟内运行的代码现在运行32秒。这是最终的代码。
Sub ProcessHeaders()
Dim rng As Word.Range
Dim para As Word.Paragraph
Dim lstFormat As Word.ListFormat
Dim paraNr() As Variant
Dim paraContent() As Variant
Dim counter As Long, paraIndex As Long
Dim Paragraph_Mapping() As Variant
Dim ParagraphCount As Long
Dim i, j As Long
Dim StartTime As Double
Dim StartRealTime As Date
Dim MinutesElapsed As String
With UserForm1
If .ComboBox4.ListCount > 0 Then
.ComboBox4.Clear
End If
counter = 1
paraIndex = 1
i = 0
j = 1
StartTime = Timer
StartRealTime = Now
Set rng = wordDoc.Content
ParagraphCount = rng.ListParagraphs.Count
For Each para In rng.ListParagraphs
i = i + 1
Set lstFormat = para.Range.ListFormat
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
.Label7.Caption = "Reading Paragraphs. " & Format(i / ParagraphCount, "0%") & " | Total of Paragraphs Found: " & ParagraphCount & _
" | Start Time: " & StartRealTime & " | Time Elapsed: " & MinutesElapsed & " Minutes"
'CheckOutLine = rng.ListParagraphs.Item(1).OutlineLevel
If lstFormat.ListString <> "" And Len(lstFormat.ListString) >= 2 Then
ReDim Preserve paraNr(counter)
ReDim Preserve paraContent(counter)
paraContent(counter) = lstFormat.ListString & " " _
& Trim(Left(para.Range.Text, (Len(para.Range.Text) - 1)))
paraNr(counter) = i
wordDoc.Bookmarks.Add Name:="ExpContent" & i, Range:=para.Range
counter = counter + 1
End If
paraIndex = paraIndex + 1
Next
j = 1
ReDim Preserve Paragraph_Mapping(1 To UBound(paraNr), 1)
For i = UBound(paraNr) To 1 Step -1
Paragraph_Mapping(j, 0) = paraContent(i)
Paragraph_Mapping(j, 1) = paraNr(i)
j = j + 1
Next i
.ComboBox4.List = Paragraph_Mapping
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
.Label7.Caption = "Identifying Headers: " & j & " identified. Total Time: " & MinutesElapsed & " minutes"
End With
'
' For counter = 1 To UBound(paraNr)
' Debug.Print paraNr(counter) & vbTab & paraContent(counter)
' Next
End Sub
在用户选择段落后,此调用正在管理书签
With objWord.Selection
BookmarkID = "ExpContent" & PositionReference
wordDoc.Bookmarks(BookmarkID).Select
.InsertParagraphBefore
End With
再一次,谢谢
答案 0 :(得分:0)
我认为最快的方法是仅循环编号段落,而不是所有段落。这可以使用ListParagraphs
对象完成。例如:
Sub IdOutlineLevels()
Dim rng As word.Range
Dim para As word.Paragraph
Dim lstFormat As word.ListFormat
Dim paraNr() As Variant
Dim paraContent() As Variant
Dim counter As Long, paraIndex As Long
counter = 1
paraIndex = 1
Set rng = ActiveDocument.content
For Each para In rng.ListParagraphs
Set lstFormat = para.Range.ListFormat
Select Case lstFormat.ListLevelNumber
Case 1, 2, 3, 4
If lstFormat.ListString <> "" Then
ReDim Preserve paraNr(counter)
ReDim Preserve paraContent(counter)
paraContent(counter) = lstFormat.ListString & " " _
& Trim(Left(para.Range.Text, (Len(para.Range.Text) - 1)))
paraNr(counter) = paraIndex
counter = counter + 1
ActiveDocument.Bookmarks.Add Name:="ExpContent" & counter, Range:=para.Range
End If
Case Else
End Select
paraIndex = paraIndex + 1
Next
For counter = 1 To UBound(paraNr)
Debug.Print paraNr(counter) & vbTab & paraContent(counter)
Next
End Sub
我没有依赖文档中段落的索引号再次找到段落,而是使用与段落编号相同的“计数器”为每个段落添加了书签。这是Word本身用于交叉引用的技术。