我在MS Excel中提取单词时遇到问题。我连续有多个 HTML 格式的句子,想提取所有以<b>....</b>
示例:
<b>buat</b> <i>v</i> <b>1</b> kerjakan; lakukan; <b>2</b> bikin;<br>--<b> cendol</b>
我要提取以下单词:"buat", "1", "2", "cendol"
您能帮我解决我的问题吗?感谢Excel / VBA中的任何代码。
答案 0 :(得分:3)
有一个非常简单的方法,可以使用HTMLDocument
对象:
在您的VB Editor
中,转到Tools>References
,然后选择Microsoft HTML Object Library
。
然后您可以使用以下代码:
Sub extract()
Dim doc As New HTMLDocument 'Declare and create an object of type HTMLDocument
Dim item As HTMLObjectElement 'Declare an object of type HTMLObjectElement. We will use this to loop through a collection of HTML elements
doc.body.innerHTML = "<b>buat</b> <i>v</i> <b>1</b> kerjakan; lakukan; <b>2</b> bikin;<br>--<b> cendol</b> " 'Assign your HTML code as a string in doc body
For Each item In doc.getElementsByTagName("b") 'Loop through all the <b></b> elements in doc
Debug.Print item.innerText 'print the text contained in <b></b> element. This will show up in your immediate window
Next item
End Sub
答案 1 :(得分:3)
如果您拥有Excel 2013+,则可以使用工作表功能FILTERXML
完成
首先,您需要通过将字符串包含在外部标签中并关闭不匹配的<br>
标签来将字符串更改为“格式正确的” XML:
"<t>" & $A$1 & "</br></t>"
然后只需要使用Xpath
即可返回所有所需标签的问题:
FILTERXML("<t>" & $A$1 & "</br></t>","//b")
在INDEX
函数中进行包装可以一次提取一个子字符串:
完整公式 进入A3并填写
=IFERROR(INDEX(FILTERXML("<t>" & $A$1 & "</br></t>","//b"),ROWS($1:1)),"")
答案 2 :(得分:1)
尝试一下
Sub Test()
Dim objReg As Object
Dim objMatches As Object
Dim match As Object
Dim s As String
Dim i As Integer
s = "<b>buat</b> <i>v</i> <b>1</b> kerjakan; lakukan; <b>2</b> bikin;<br>--<b> cendol</b> "
Set objReg = CreateObject("VBScript.RegExp")
With objReg
.IgnoreCase = False
.Global = True
.Pattern = "<b>(.*?)<\/b>"
Set objMatches = .Execute(s)
End With
For Each match In objMatches
For i = 0 To match.Submatches.Count - 1
Debug.Print Trim(match.Submatches.item(i))
Next i
Next match
Set objReg = Nothing
End Sub
答案 3 :(得分:1)
使用XML DomDocument
的替代方法
分析HTML
字符串,显然可以像在HTMLDocument
或►XML
中那样使用文档对象结构。因此,出于完整性的考虑,我展示了另一种方法,除了 @StavrosJon的有效解决方案(使用更宽松的HTMLDocument
并不需要像XML那样格式正确)之外, :
示例呼叫
Sub ExtractViaXML()
Dim html$, myArray()
html = "<b>buat</b> <i>v</i> <b>1</b> kerjakan; lakukan; <b>2</b> bikin;<br>--<b> cendol</b> "
myArray = getTokens(html, "b") ' assign findings to array via function getTokens()
Debug.Print UBound(myArray) + 1 & " token(s) found: " & Join(myArray, ", ") ' display results
End Sub
主要功能getTokens()
Function getTokens(ByVal html$, Optional myTag$ = "b") As Variant()
' Purpose: isolate "<b>"-Tags (default value) out of html string and return found tokens as variant array
' Note: creates temporary XML DOMDocument (late bound MSXML2 reference)
Dim XmlString$
XmlString = wellformed("<?xml version=""1.0"" encoding=""utf-8""?><tokens>" & html & "</tokens>")
With CreateObject("MSXML2.DOMDocument.6.0")
.ValidateOnParse = True: .Async = False
If .LoadXML(XmlString) Then ' load xml string
Dim myNodeList As Object
Set myNodeList = .DocumentElement.SelectNodes(myTag) ' set node list to memory
Dim i&, ii&, arr()
ii = myNodeList.Length - 1 ' calculate upper boundary of zero-based array
If ii > -1 Then ReDim arr(ii) ' (re)dimension variant array arr()
For i = 0 To ii ' loop through node list
arr(i) = myNodeList.item(i).Text ' assign each found text content to array
Next i
If ii = -1 Then arr = Array("**Nothing found**") ' provide for zero findings
getTokens = arr ' return 0-based 1-dim array with found tokens
Else: ShowParseError (.ParseError) ' optional: display possible error message
End If
End With
End Function
辅助功能
XML 需要具有打开和关闭标签的结构良好的节点结构,或者 HTML 对例如单个换行符(<br>
)。因此,我在 cure 中添加了一个简单的函数wellformed()
,以防止成功加载。此外,我演示了使用可选过程ShowParseError
来本地化(其他)可能的加载错误,您可以将其用作任何.load
或.loadXML
函数的补充。
Function wellformed$(ByVal s$)
' Purpose: force a wellformed version of line breaks in html/xml string ("<br/>")
' Note: unclosed tags like <br> only would prevent a successful load of the xml document
wellformed = Replace(Replace(s, "</br>", "<br>"), "<br>", "<br/>")
End Function
Sub ShowParseError(pe As Object)
' Purpose: display possible parse error
' Note: localizes error occurrence also by indicating position
Dim ErrText$
With pe
ErrText = "Load error " & .ErrorCode & " xml file " & vbCrLf & _
Replace(.URL, "file:///", "") & vbCrLf & vbCrLf & _
.reason & _
"Source Text: " & .srcText & vbCrLf & vbCrLf & _
"Line No.: " & .Line & vbCrLf & _
"Line Pos.: " & .linepos & vbCrLf & _
"File Pos.: " & .filepos & vbCrLf & vbCrLf
End With
MsgBox ErrText, vbExclamation
End Sub
答案 4 :(得分:0)
答案 5 :(得分:0)
我尝试了一些不同的方法,包括拆分,合并和拆分,以及循环槽数组。我在单元格<b>buat</b> <i>v</i> <b>1</b> kerjakan; lakukan; <b>2</b> bikin;<br>--<b> cendol</b>
中键入了文本A1
:
Sub Macro1()
Dim MyWords As Variant
Dim i As Long
Dim MyDelimiter As String
Dim MyLen As Byte
MyDelimiter = "||" 'Choose 1 not common delimiter
MyLen = Len(MyDelimiter)
MyWords = Split(Join(Split(Range("A1").Value, "<b>"), MyDelimiter), "</b>")
For i = 0 To UBound(MyWords) Step 1
Debug.Print Mid(MyWords(i), InStr(1, MyWords(i), MyDelimiter) + MyLen, 99) 'Increase 99 if you are sure there will be longer texts between tags <b>..</b>
Next i
Erase MyWords
End Sub
我明白了: