我正在尝试逐行阅读Word文档(800多页),如果该行包含某些文本,在本例中为Section
,则只需将该行打印到控制台。
Public Sub doIt()
SearchFile("theFilePath", "Section")
Console.WriteLine("SHit")
End Sub
Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
Dim sr As StreamReader = New StreamReader(strFilePath)
Dim strLine As String = String.Empty
For Each line As String In sr.ReadLine
If line.Contains(strSearchTerm) = True Then
Console.WriteLine(line)
End If
Next
End Sub
它运行,但它不打印任何东西。我知道“部分”这个词也在那里多次出现。
答案 0 :(得分:3)
正如评论中已经提到的,您无法按照当前的方式搜索Word
文档。您需要创建一个如上所述的Word.Application
对象,然后加载文档以便搜索它。
这是我为你写的一个简短的例子。请注意,您需要添加对 Microsoft.Office.Interop.Word 的引用,然后您需要将import语句添加到您的类。例如Imports Microsoft.Office.Interop
。此外,这会抓取每个段落,然后使用范围来查找您要搜索的单词,如果找到它会将其添加到列表中。
注意:经过测试和测试 - 我在按钮事件中有这个,但放在你需要的地方。
Try
Dim objWordApp As Word.Application = Nothing
Dim objDoc As Word.Document = Nothing
Dim TextToFind As String = YOURTEXT
Dim TextRange As Word.Range = Nothing
Dim StringLines As New List(Of String)
objWordApp = CreateObject("Word.Application")
If objWordApp IsNot Nothing Then
objWordApp.Visible = False
objDoc = objWordApp.Documents.Open(FileName, )
End If
If objDoc IsNot Nothing Then
'loop through each paragraph in the document and get the range
For Each p As Word.Paragraph In objDoc.Paragraphs
TextRange = p.Range
TextRange.Find.ClearFormatting()
If TextRange.Find.Execute(TextToFind, ) Then
StringLines.Add(p.Range.Text)
End If
Next
If StringLines.Count > 0 Then
MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
End If
objDoc.Close()
objWordApp.Quit()
End If
Catch ex As Exception
'publish your exception?
End Try
更新以使用句子 - 这将遍历每个段落并抓住每个句子,然后我们可以看到该词是否存在...这样做的好处是它更快,因为我们得到每个段落和然后搜索句子。我们必须得到段落才能得到句子......
Try
Dim objWordApp As Word.Application = Nothing
Dim objDoc As Word.Document = Nothing
Dim TextToFind As String = "YOUR TEXT TO FIND"
Dim TextRange As Word.Range = Nothing
Dim StringLines As New List(Of String)
Dim SentenceCount As Integer = 0
objWordApp = CreateObject("Word.Application")
If objWordApp IsNot Nothing Then
objWordApp.Visible = False
objDoc = objWordApp.Documents.Open(FileName, )
End If
If objDoc IsNot Nothing Then
For Each p As Word.Paragraph In objDoc.Paragraphs
TextRange = p.Range
TextRange.Find.ClearFormatting()
SentenceCount = TextRange.Sentences.Count
If SentenceCount > 0 Then
Do Until SentenceCount = 0
Dim sentence As String = TextRange.Sentences.Item(SentenceCount).Text
If sentence.Contains(TextToFind) Then
StringLines.Add(sentence.Trim())
End If
SentenceCount -= 1
Loop
End If
Next
If StringLines.Count > 0 Then
MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
End If
objDoc.Close()
objWordApp.Quit()
End If
Catch ex As Exception
'publish your exception?
End Try
答案 1 :(得分:1)
这是一个sub,它将打印找到搜索字符串的每一行,而不是每个段落。它将模仿在示例中使用streamreader来读取/检查每一行的行为:
'Add reference to and import Microsoft.Office.Interop.Word
Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
Dim wordObject As Word.Application = New Word.Application
wordObject.Visible = False
Dim objWord As Word.Document = wordObject.Documents.Open(strFilePath)
objWord.Characters(1).Select()
Dim bolEOF As Boolean = False
Do Until bolEOF
wordObject.Selection.MoveEnd(WdUnits.wdLine, 1)
If wordObject.Selection.Text.ToUpper.Contains(strSearchTerm.ToUpper) Then
Console.WriteLine(wordObject.Selection.Text.Replace(vbCr, "").Replace(vbCr, "").Replace(vbCrLf, ""))
End If
wordObject.Selection.Collapse(WdCollapseDirection.wdCollapseEnd)
If wordObject.Selection.Bookmarks.Exists("\EndOfDoc") Then
bolEOF = True
End If
Loop
objWord.Close()
wordObject.Quit()
objWord = Nothing
wordObject = Nothing
Me.Close()
End Sub
的略微修改的vb.net实现