我有一个VB.NET代码,它始终查找并替换Word文档文件(.docx)中的文本。我正在使用OpenXml进行此过程。 但我想只替换HTML标记文本,并在替换文档中的新文本后始终删除标记。
我的代码是:
Public Sub SearchAndReplace(ByVal document As String)
Dim wordDoc As WordprocessingDocument = WordprocessingDocument.Open(document, True)
Using (wordDoc)
Dim docText As String = Nothing
Dim sr As StreamReader = New StreamReader(wordDoc.MainDocumentPart.GetStream)
Using (sr)
docText = sr.ReadToEnd
End Using
Dim regexText As Regex = New Regex("<ReplaceText>")
docText = regexText.Replace(docText, "Hi Everyone!")
Dim sw As StreamWriter = New StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create))
Using (sw)
sw.Write(docText)
End Using
End Using
答案 0 :(得分:0)
以下是帮助您解决问题的方法。
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim Text As String = "Blah<foo>Blah"
'Prints Text
Console.WriteLine(Text)
Dim regex As New Regex("(<)[]\w\/]+(>)")
'Prints Text after replace the in-between the capturing group 1 and 2.
'Capturing group are marked between parenthesis in the regex pattern
Console.WriteLine(regex.Replace(Text, "$1foo has been replaced.$2"))
'Update Text
Text = regex.Replace(Text, "$1foo has been replaced.$2")
'Remove starting tag
Dim p As Integer = InStr(Text, "<")
Text = Text.Remove(p - 1, 1)
'Remove trailing tag
Dim pp As Integer = InStr(Text, ">")
Text = Text.Remove(pp - 1, 1)
'Print Text
Console.WriteLine(Text)
Console.ReadLine()
End Sub
End Module
输出:
如果每行有多个标记,则上述代码将无法运行。