VB.Net:逐行搜索Word文档

时间:2017-01-04 19:06:01

标签: vb.net file search ms-word streamreader

我正在尝试逐行阅读Word文档(800多页),如果该行包含某些文本,在本例中为Section,则只需将该行打印到控制台。

Public Sub doIt()
    SearchFile("theFilePath", "Section")
    Console.WriteLine("SHit")
End Sub

Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
    Dim sr As StreamReader = New StreamReader(strFilePath)
    Dim strLine As String = String.Empty

    For Each line As String In sr.ReadLine
        If line.Contains(strSearchTerm) = True Then
            Console.WriteLine(line)
        End If
    Next

End Sub

它运行,但它不打印任何东西。我知道“部分”这个词也在那里多次出现。

2 个答案:

答案 0 :(得分:3)

正如评论中已经提到的,您无法按照当前的方式搜索Word文档。您需要创建一个如上所述的Word.Application对象,然后加载文档以便搜索它。

这是我为你写的一个简短的例子。请注意,您需要添加对 Microsoft.Office.Interop.Word 的引用,然后您需要将import语句添加到您的类。例如Imports Microsoft.Office.Interop。此外,这会抓取每个段落,然后使用范围来查找您要搜索的单词,如果找到它会将其添加到列表中。

注意:经过测试和测试 - 我在按钮事件中有这个,但放在你需要的地方。

    Try
                Dim objWordApp As Word.Application = Nothing
                Dim objDoc As Word.Document = Nothing
                Dim TextToFind As String = YOURTEXT
                Dim TextRange As Word.Range = Nothing
                Dim StringLines As New List(Of String)

                objWordApp = CreateObject("Word.Application")

                If objWordApp IsNot Nothing Then
                    objWordApp.Visible = False
                    objDoc = objWordApp.Documents.Open(FileName, )
                End If

                If objDoc IsNot Nothing Then

                    'loop through each paragraph in the document and get the range
                    For Each p As Word.Paragraph In objDoc.Paragraphs
                        TextRange = p.Range
                        TextRange.Find.ClearFormatting()

                        If TextRange.Find.Execute(TextToFind, ) Then
                            StringLines.Add(p.Range.Text)
                        End If
                    Next

                    If StringLines.Count > 0 Then
                        MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
                    End If

                    objDoc.Close()
                    objWordApp.Quit()

                End If


            Catch ex As Exception
                'publish your exception?
            End Try

更新以使用句子 - 这将遍历每个段落并抓住每个句子,然后我们可以看到该词是否存在...这样做的好处是它更快,因为我们得到每个段落和然后搜索句子。我们必须得到段落才能得到句子......

Try
            Dim objWordApp As Word.Application = Nothing
            Dim objDoc As Word.Document = Nothing
            Dim TextToFind As String = "YOUR TEXT TO FIND"
            Dim TextRange As Word.Range = Nothing
            Dim StringLines As New List(Of String)
            Dim SentenceCount As Integer = 0

            objWordApp = CreateObject("Word.Application")

            If objWordApp IsNot Nothing Then
                objWordApp.Visible = False
                objDoc = objWordApp.Documents.Open(FileName, )
            End If

            If objDoc IsNot Nothing Then

                For Each p As Word.Paragraph In objDoc.Paragraphs
                    TextRange = p.Range
                    TextRange.Find.ClearFormatting()
                    SentenceCount = TextRange.Sentences.Count
                    If SentenceCount > 0 Then
                        Do Until SentenceCount = 0
                            Dim sentence As String = TextRange.Sentences.Item(SentenceCount).Text
                            If sentence.Contains(TextToFind) Then
                                StringLines.Add(sentence.Trim())
                            End If

                            SentenceCount -= 1
                        Loop
                    End If
                Next

                If StringLines.Count > 0 Then
                    MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
                End If

                objDoc.Close()
                objWordApp.Quit()

            End If


        Catch ex As Exception
            'publish your exception?
        End Try

答案 1 :(得分:1)

这是一个sub,它将打印找到搜索字符串的每一行,而不是每个段落。它将模仿在示例中使用streamreader来读取/检查每一行的行为:

'Add reference to and import Microsoft.Office.Interop.Word
Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
    Dim wordObject As Word.Application = New Word.Application
    wordObject.Visible = False
    Dim objWord As Word.Document = wordObject.Documents.Open(strFilePath)
    objWord.Characters(1).Select()

    Dim bolEOF As Boolean = False
    Do Until bolEOF
        wordObject.Selection.MoveEnd(WdUnits.wdLine, 1)
        If wordObject.Selection.Text.ToUpper.Contains(strSearchTerm.ToUpper) Then
            Console.WriteLine(wordObject.Selection.Text.Replace(vbCr, "").Replace(vbCr, "").Replace(vbCrLf, ""))
        End If
        wordObject.Selection.Collapse(WdCollapseDirection.wdCollapseEnd)
        If wordObject.Selection.Bookmarks.Exists("\EndOfDoc") Then
            bolEOF = True
        End If
    Loop

    objWord.Close()
    wordObject.Quit()
    objWord = Nothing
    wordObject = Nothing
    Me.Close()
End Sub

这是nawfal's solution to parsing word document lines

的略微修改的vb.net实现