我正在使用AutoIt来解析HTML。我想通过属性的值获取所有HTML元素。示例:
<div data-source="xxx">The div content XXX</div>
<div data-source="zzz">The div content of ZZZ</div>
应选择包含属性 - 值对data-source="xxx"
的div元素。
答案 0 :(得分:0)
您可以使用RegExp尝试类似的东西:
#include <Array.au3>
$Data = '<div data-source="xxx">The div content XXX</div>' & @CRLF & _
'<div data-source="zzz">The div content XXX</div>'
MsgBox ("","",$Data)
Local $array = StringRegExp($Data,"(\s.*=\x22\w.*\x22)",3)
_ArrayDisplay($array)
For $i=0 to Ubound($array)
MsgBox ("","", $array[$i])
Next
这是另一个示例,向您展示如何阅读file.html内容并向您显示提取的数据:
#include <Array.au3>
#include <FileConstants.au3>
Local Const $sFilePath = "Example.html"
; Open the file for reading and store the handle in a variable.
Local $hFileOpen = FileOpen($sFilePath, $FO_READ)
; Reads the contents of the file using the handle returned by FileOpen.
Local $sFileRead = FileRead($hFileOpen)
; Closes the handle returned by FileOpen.
FileClose($hFileOpen)
$Data = $sFileRead
Local $array = StringRegExp($Data, "(\s.*=\x22\w.*\x22)", 3)
_ArrayDisplay($array)
For $i = 0 To UBound($array)
MsgBox("", "", $array[$i])
Next
答案 1 :(得分:0)
试试这个?
$ohtml = ObjCreate('HTMLFILE')
$ohtml.body.innerHTML = '<div data-source="xxx">The div content XXX</div>' & @CRLF & _
'<div data-source="zzz">The div content of ZZZ</div>'
Dim $selected_node
For $div in $ohtml.body.getElementsByTagName("div")
If $div.getAttribute("data-source") = 'xxx' Then
$selected_node = $div
ExitLoop
EndIf
Next
ConsoleWrite($selected_node.innerHTML & @CRLF)