识别具有特定大纲级别的段落内容

时间:2016-05-24 01:25:35

标签: vba ms-word word-vba

我有一个宏,它通过word文档的段落。此代码旨在传递段落,标识其大纲级别并在找到所需的段落大纲级别时检索内容。有了这些信息,我正在填充一个列表框,允许用户选择从哪个点导出文档中的某些文本。

此功能正在运行,但是,我正在寻找一种提高速度的方法。现在我正在处理一个包含5678段的文档,处理所有信息需要30多分钟。你有什么建议吗?

我试图接近而没有取得成功:

1 - 我尝试使用对象TableOfContents,但是我无法获得干净的信息并将大纲级别与段落区分开来。

2 - 我试图调整此处Getting the headings from a Word document的代码,特别是因为使用了命令_docSource.GetCrossReferenceItems(wdRefTypeHeading),也没有成功

这里有表单的图像和我正在使用的代码。 Form of execution

Sub ProcessHeaders()
Dim j As Long
Dim Paragraph_Number() As Variant
Dim Paragraph_Content() As Variant
Dim Paragraph_Mapping() As Variant
j = 1
With UserForm1
If .ComboBox4.ListCount > 0 Then
    .ComboBox4.Clear
End If

For i = 1 To wordDoc.Paragraphs.Count
If wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel1 _
Or wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel2 _
Or wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel3 _
Or wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel4 Then
If wordDoc.Paragraphs.Item(i).Range.ListFormat.ListString <> "" Then
    ReDim Preserve Paragraph_Number(j)
    ReDim Preserve Paragraph_Content(j)
    Paragraph_Content(j) = wordDoc.Paragraphs.Item(i).Range.ListFormat.ListString & " " & Trim(Left(wordDoc.Paragraphs.Item(i).Range.Text, (Len(wordDoc.Paragraphs.Item(i).Range.Text) - 1)))
    Paragraph_Number(j) = i
    j = j + 1
End If
End If
Next i

    ReDim Preserve Paragraph_Mapping(1 To UBound(Paragraph_Content), 1)
    For i = 1 To UBound(Paragraph_Number)
    Paragraph_Mapping(i, 0) = Paragraph_Content(i)
    Paragraph_Mapping(i, 1) = Paragraph_Number(i)

    Next i

.ComboBox4.List = Paragraph_Mapping
End With
End Sub

编辑1 - 我使用下面的代码实现将执行时间从32分钟缩短到8分钟。有什么建议可以改善吗?提前致谢

Sub ProcessHeaders()
Dim j As Long
Dim thisOutlineLevel As WdOutlineLevel
Dim thisHeader As String
Dim thisList As String
Dim ParagraphCount As Long

Dim Paragraph_Number_Base() As Variant
Dim Paragraph_Content_Base() As Variant
Dim Paragraph_ListItem_Base() As Variant

Dim ParagraphContent() As Variant
Dim ParagraphNumber() As Variant
Dim Paragraph_Mapping() As Variant

Dim StartTime As Double
Dim MinutesElapsed As String



j = 1
With UserForm1
If .ComboBox4.ListCount > 0 Then
    .ComboBox4.Clear
End If

ParagraphCount = wordDoc.Paragraphs.Count

ReDim Paragraph_Content_Base(ParagraphCount + 1)
ReDim Paragraph_ListItem_Base(ParagraphCount + 1)
ReDim Paragraph_Number_Base(ParagraphCount + 1)


StartTime = Timer
For i = 1 To ParagraphCount
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
UserForm1.Label7.Caption = "Reading Paragraphs.  " & Format(i / ParagraphCount, "0%") & " | Total of Paragraphs Found: " & ParagraphCount & " | Time Elapsed: " _
& MinutesElapsed & " Minutes"
With wordDoc.Paragraphs.Item(i)
    Select Case .OutlineLevel
        Case wdOutlineLevelBodyText
            GoTo ResumeNext
        Case wdOutlineLevel1, wdOutlineLevel2, wdOutlineLevel3, wdOutlineLevel4
            Paragraph_Content_Base(i) = .Range.Text
            Paragraph_ListItem_Base(i) = .Range.ListFormat.ListString
            Paragraph_Number_Base(i) = i
    End Select
End With

ResumeNext:
Next i
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
UserForm1.Label7.Caption = ParagraphCount & " read on " & MinutesElapsed & " Minutes. Now, identifying the Headers"

For i = 0 To UBound(Paragraph_Content_Base)
    If Paragraph_Content_Base(i) <> "" And Paragraph_ListItem_Base(i) <> "" Then
        ReDim Preserve ParagraphContent(j)
        ReDim Preserve ParagraphNumber(j)
        ParagraphContent(j) = Trim(Paragraph_ListItem_Base(i)) & " " & Trim(Left(Paragraph_Content_Base(i), Len(Paragraph_Content_Base(i)) - 1))
        ParagraphNumber(j) = Paragraph_Number_Base(i)
        j = j + 1
    End If
Next i


Erase Paragraph_Content_Base
Erase Paragraph_ListItem_Base
Erase Paragraph_Number_Base

    ReDim Preserve Paragraph_Mapping(1 To UBound(ParagraphContent), 1)
    For i = 1 To UBound(ParagraphNumber)
        Paragraph_Mapping(i, 0) = ParagraphContent(i)
        Paragraph_Mapping(i, 1) = ParagraphNumber(i)
    Next i

.ComboBox4.List = Paragraph_Mapping
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
UserForm1.Label7.Caption = "Identifying Headers: " & j & " identified. Total Time: " & MinutesElapsed & " minutes"
End With

编辑2 - 在Cindy的帮助下,最初在32分钟内运行的代码现在运行32秒。这是最终的代码。

Sub ProcessHeaders()
    Dim rng As Word.Range
    Dim para As Word.Paragraph
    Dim lstFormat As Word.ListFormat
    Dim paraNr() As Variant
    Dim paraContent() As Variant
    Dim counter As Long, paraIndex As Long

    Dim Paragraph_Mapping() As Variant
    Dim ParagraphCount As Long
    Dim i, j As Long

    Dim StartTime As Double
    Dim StartRealTime As Date
    Dim MinutesElapsed As String

    With UserForm1
    If .ComboBox4.ListCount > 0 Then
        .ComboBox4.Clear
    End If

    counter = 1
    paraIndex = 1
    i = 0
    j = 1
    StartTime = Timer
    StartRealTime = Now
    Set rng = wordDoc.Content
    ParagraphCount = rng.ListParagraphs.Count

    For Each para In rng.ListParagraphs
        i = i + 1
        Set lstFormat = para.Range.ListFormat
        MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
        .Label7.Caption = "Reading Paragraphs.  " & Format(i / ParagraphCount, "0%") & " | Total of Paragraphs Found: " & ParagraphCount & _
        " | Start Time: " & StartRealTime & " | Time Elapsed: " & MinutesElapsed & " Minutes"
        'CheckOutLine = rng.ListParagraphs.Item(1).OutlineLevel
                If lstFormat.ListString <> "" And Len(lstFormat.ListString) >= 2 Then
                    ReDim Preserve paraNr(counter)
                    ReDim Preserve paraContent(counter)
                    paraContent(counter) = lstFormat.ListString & " " _
                                           & Trim(Left(para.Range.Text, (Len(para.Range.Text) - 1)))
                    paraNr(counter) = i
                    wordDoc.Bookmarks.Add Name:="ExpContent" & i, Range:=para.Range
                    counter = counter + 1
                End If
        paraIndex = paraIndex + 1
    Next
j = 1

    ReDim Preserve Paragraph_Mapping(1 To UBound(paraNr), 1)
    For i = UBound(paraNr) To 1 Step -1
        Paragraph_Mapping(j, 0) = paraContent(i)
        Paragraph_Mapping(j, 1) = paraNr(i)
        j = j + 1
    Next i
    .ComboBox4.List = Paragraph_Mapping
    MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
    .Label7.Caption = "Identifying Headers: " & j & " identified. Total Time: " & MinutesElapsed & " minutes"
    End With

'
'    For counter = 1 To UBound(paraNr)
'        Debug.Print paraNr(counter) & vbTab & paraContent(counter)
'    Next
End Sub

在用户选择段落后,此调用正在管理书签

With objWord.Selection
        BookmarkID = "ExpContent" & PositionReference
        wordDoc.Bookmarks(BookmarkID).Select
       .InsertParagraphBefore
End With

Form of Execution 1

再一次,谢谢

1 个答案:

答案 0 :(得分:0)

我认为最快的方法是仅循环编号段落,而不是所有段落。这可以使用ListParagraphs对象完成。例如:

Sub IdOutlineLevels()
    Dim rng As word.Range
    Dim para As word.Paragraph
    Dim lstFormat As word.ListFormat
    Dim paraNr() As Variant
    Dim paraContent() As Variant
    Dim counter As Long, paraIndex As Long

    counter = 1
    paraIndex = 1
    Set rng = ActiveDocument.content
    For Each para In rng.ListParagraphs
        Set lstFormat = para.Range.ListFormat
        Select Case lstFormat.ListLevelNumber
            Case 1, 2, 3, 4
                If lstFormat.ListString <> "" Then
                    ReDim Preserve paraNr(counter)
                    ReDim Preserve paraContent(counter)
                    paraContent(counter) = lstFormat.ListString & " " _
                                           & Trim(Left(para.Range.Text, (Len(para.Range.Text) - 1)))
                    paraNr(counter) = paraIndex
                    counter = counter + 1
                    ActiveDocument.Bookmarks.Add Name:="ExpContent" & counter, Range:=para.Range
                End If
            Case Else
        End Select
        paraIndex = paraIndex + 1
    Next

    For counter = 1 To UBound(paraNr)
        Debug.Print paraNr(counter) & vbTab & paraContent(counter)
    Next
End Sub

我没有依赖文档中段落的索引号再次找到段落,而是使用与段落编号相同的“计数器”为每个段落添加了书签。这是Word本身用于交叉引用的技术。