我想从HTML代码中检索一些信息。让我们考虑以下几点:
View
我可以使用<ul class="article-additional-info">
<li><strong>Issue Year:</strong> 2011</li>
<li><strong>Issue No:</strong> 1 (200)</li>
<li><strong>Page Range:</strong> 65-80</li>
<li><strong>Page Count:</strong> 15</li>
<li><strong>Language:</strong> Polish</li>
</ul>
来获取article-additional-info
课程中的所有信息。
但是如何从此类中检索个人信息,例如document.getElementsByClassName("article-additional-info")[0].innerText
(来自2011
)?
我想避免使用RegEx。
修改
根据答案,我稍微修改了代码。但是,我无法摆脱一个元素:<strong>Issue Year:</strong> 2011<
。这是代码:
Language:
答案 0 :(得分:0)
尝试如下:
html =
(
<body>
<ul class="article-additional-info">
<li><strong>Issue Year:</strong> 2011</li>
<li><strong>Issue No:</strong> 1 (200)</li>
<li><strong>Page Range:</strong> 65-80</li>
<li><strong>Page Count:</strong> 15</li>
<li><strong>Language:</strong> Polish</li>
</ul>
<ul class="article-additional-info">
<li><strong>Issue Year:</strong> XX 2011</li>
<li><strong>Issue No:</strong> XX 1 (200)</li>
<li><strong>Page Range:</strong> XX 65-80</li>
<li><strong>Page Count:</strong> XX 15</li>
<li><strong>Language:</strong> XX Polish</li>
</ul>
</body>
)
test := "Language:" ; adjust for the variable you want to return
classno := 1 ; adjust the number for the correct class instance!
document := ComObjCreate("HTMLfile")
document.write(html)
try While (x := document.getElementsByTagName("ul")[A_Index-1])
{
if (x.className = "article-additional-info")
yclass%A_Index% := x.innerHTML
}
html := yclass%classno%
document.Close
document := ComObjCreate("HTMLfile")
document.write(html)
try While (x := document.getElementsByTagName("strong")[A_Index-1])
{
StringLen, y, test
if (x.innerText = test)
msgbox % substr(x.parentnode.innerText, y+2) ; returns "Polish"
}
ExitApp
而且,如果你想迭代所有类实例和所有变量,就这样做:
html =
(
<body>
<ul class="article-additional-info">
<li><strong>Issue Year:</strong> 2011</li>
<li><strong>Issue No:</strong> 1 (200)</li>
<li><strong>Page Range:</strong> 65-80</li>
<li><strong>Page Count:</strong> 15</li>
<li><strong>Language:</strong> Polish</li>
</ul>
<ul class="ao">
<li><strong>Issue Year:</strong> zz 2011</li>
<li><strong>Issue No:</strong> zz 1 (200)</li>
<li><strong>Page Range:</strong> zz 65-80</li>
<li><strong>Page Count:</strong> zz 15</li>
<li><strong>Language:</strong> zz Polish</li>
</ul>
<ul class="article-additional-info">
<li><strong>Issue Year:</strong> XX 2011</li>
<li><strong>Issue No:</strong> XX 1 (200)</li>
<li><strong>Page Range:</strong> XX 65-80</li>
<li><strong>Page Count:</strong> XX 15</li>
<li><strong>Language:</strong> XX Polish</li>
</ul>
</body>
)
document := ComObjCreate("HTMLfile")
document.write(html)
; To skip a variable, change it to: "" (as shown, where only first 3 are shown
test := ["Issue Year:", "Issue No:", "Page Range:", "", ""]
try While (x := document.getElementsByTagName("ul")[A_Index-1])
{
if (x.className = "article-additional-info")
{
count++
yclass%count% := x.innerHTML
}
}
loop, %count%
{
which++
html := yclass%A_Index%
document.Close
document := ComObjCreate("HTMLfile")
document.write(html)
try While (x := document.getElementsByTagName("strong")[A_Index-1])
{
StringLen, y, % test[A_Index]
if (test[A_Index] <> "")
msgbox % which . ": " . test[A_Index] . " " . substr(x.parentnode.innerText, y+2)
}
}
ExitApp
substr(x.parentnode.innerText, y+2)
是您要查找的值。
玩得开心!!
答案 1 :(得分:-1)
您可以通过COM对象HTMLFile轻松操作HTML,并使用StrSplit()解析生成的文本。下面是使用您提供的HTML和DOM查询的示例:
html =
(
<ul class="article-additional-info">
<li><strong>Issue Year:</strong> 2011</li>
<li><strong>Issue No:</strong> 1 (200)</li>
<li><strong>Page Range:</strong> 65-80</li>
<li><strong>Page Count:</strong> 15</li>
<li><strong>Language:</strong> Polish</li>
</ul>
)
document := ComObjCreate("HTMLfile")
document.write(html)
x := document.getElementsByClassName("article-additional-info")[0].innerText
MsgBox % StrSplit(StrSplit(x, "`n", "`r").5, " ").2
编辑:
url := "https://www.ceeol.com/search/article-detail?id=134854"
html := getPage(url)
document := ComObjCreate("HTMLfile")
document.write(html)
x := document.getElementsByClassName("article-additional-info")[0].innerText
For k, v in StrSplit(x, "`n", "`r") {
r .= StrSplit(v, ": ").2 "`n"
}
MsgBox % r
getPage(url) {
whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
whr.Open("GET", URL, true)
whr.Send()
; Using 'true' above and the call below allows the script to remain responsive.
whr.WaitForResponse()
return whr.ResponseText
}
使用querySelectorAll的另一个例子:
wb := ComObjCreate("InternetExplorer.Application")
wb.Visible := True
wb.Navigate("https://www.ceeol.com/search/article-detail?id=134854")
While wb.Busy
sleep 100
loop, 3
r.= wb.document.querySelectorAll(".article-additional-info li")[a_Index-1].lastChild.nodeValue "`n"
msgbox % r
wb.quit()
exitapp