检测何时加载网页而不使用睡眠

时间:2014-04-23 00:05:15

标签: internet-explorer dom web-scraping vbscript wsh

我在Windows上创建一个VB脚本,在IE中打开一个站点。我想要的:检测网页何时加载并显示消息。我通过使用sleep(WScript.Sleep)来实现这一目标。网站加载时的秒数。但是,该网站在中途弹出用户名,密码。只有当用户输入凭据时,才会完成加载页面。所以我不想使用“睡眠”大约几秒钟,而是使用精确的函数或检测页面加载的方法。我在线查看并尝试使用Do While循环,onloadonclick函数,但没有任何效果。为了简化,即使我编写脚本来打开像yahoo这样的站点并检测,在页面加载时显示消息“Hi”:如果不使用sleep(WScript.Sleep),它将无法工作。

2 个答案:

答案 0 :(得分:5)

尝试传统方法:

Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
objIE.Navigate "https://www.yahoo.com/"
Do While objIE.ReadyState <> 4
    WScript.Sleep 10
Loop
' your code here
' ...

UPD:这个应检查错误:

Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
objIE.Navigate "https://www.yahoo.com/"
On Error Resume Next
Do 
    If objIE.ReadyState = 4 Then
        If Err = 0 Then
            Exit Do
        Else
            Err.Clear
        End If
    End If
    WScript.Sleep 10
Loop
On Error Goto 0
' your code here
' ...

UPD2:您写道,当登录弹出窗口进入时,IE会断开连接,假设有一种方法可以捕获断开连接,然后再次获取IE实例。注意这是&#34;异常编程&#34; :)我希望这会有所帮助:

Option Explicit
Dim objIE, strSignature, strInitType

Set objIE = CreateObject("InternetExplorer.Application") ' create IE instance
objIE.Visible = True
strSignature = Left(CreateObject("Scriptlet.TypeLib").GUID, 38) ' generate uid
objIE.putproperty "marker", strSignature ' tokenize the instance
strInitType = TypeName(objIE) ' get typename
objIE.Navigate "https://www.yahoo.com/"
MsgBox "Initial type = " & TypeName(objIE) ' for visualisation

On Error Resume Next
Do While TypeName(objIE) = strInitType ' wait until typename changes (ActveX disconnection), may cause error 800A000E if not within OERN
    WScript.Sleep 10
Loop
MsgBox "Changed type = " & TypeName(objIE) ' for visualisation

Set objIE = Nothing ' excessive statement, just for clearance
Do
    For Each objIE In CreateObject("Shell.Application").Windows ' loop through all explorer windows to find tokenized instance
        If objIE.getproperty("marker") = strSignature Then ' our instance found
            If TypeName(objIE) = strInitType Then Exit Do ' may be excessive type check
        End If
    Next
    WScript.Sleep 10
Loop
MsgBox "Found type = " & TypeName(objIE) ' for visualisation
On Error GoTo 0

Do While objIE.ReadyState <> 4 ' conventional wait if instance not ready
    WScript.Sleep 10
Loop

MsgBox "Title = " & objIE.Document.Title ' for visualisation

您可以从DOM获取所有文本节点,链接等,如下所示:

Option Explicit
Dim objIE, colTags, strResult, objTag, objChild, arrResult

Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
objIE.Navigate "https://www.yahoo.com/"

Do While objIE.ReadyState <> 4
    WScript.Sleep 10
Loop

Set colTags = objIE.Document.GetElementsByTagName("a")
strResult = "Total " & colTags.Length & " DOM Anchor Nodes:" & vbCrLf
For Each objTag In colTags
    strResult = strResult & objTag.GetAttribute("href") & vbCrLf
Next
ShowInNotepad strResult

Set colTags = objIE.Document.GetElementsByTagName("*")
arrResult = Array()
For Each objTag In colTags
    For Each objChild In objTag.ChildNodes
        If objChild.NodeType = 3 Then
            ReDim Preserve arrResult(UBound(arrResult) + 1)
            arrResult(UBound(arrResult)) = objChild.NodeValue
        End If
    Next
Next
strResult = "Total " & colTags.Length & " DOM object nodes + total " & UBound(arrResult) + 1 & " #text nodes:" & vbCrLf
strResult = strResult & Join(arrResult, vbCrLf)
ShowInNotepad strResult

objIE.Quit

Sub ShowInNotepad(strToFile)
    Dim strTempPath
    With CreateObject("Scripting.FileSystemObject")
        strTempPath = CreateObject("WScript.Shell").ExpandEnvironmentStrings("%TEMP%") & "\" & .gettempname
        With .CreateTextFile(strTempPath, True, True)
            .WriteLine (strToFile)
            .Close
        End With
        CreateObject("WScript.Shell").Run "notepad.exe " & strTempPath, 1, True
        .DeleteFile (strTempPath)
    End With
End Sub

同时查看get text data

UPD3:我想在此处进一步检查网页加载和初始化是否已完成:

' ...
' Navigating to some url
objIE.Navigate strUrl
' Wait for IE ready
Do While objIE.ReadyState <> 4 Or objIE.Busy
    WScript.Sleep 10
Loop
' Wait for document complete
Do While objIE.Document.ReadyState <> "complete"
    WScript.Sleep 10
Loop
' Processing loaded webpage code
' ...

UPD4 :在某些情况下,您需要跟踪文档中是否已创建目标节点(如果在尝试时出现Object required错误,通常需要这样做通过.getElementById等访问节点:)

如果页面使用AJAX(加载的页面源HTML不包含目标节点,像JavaScript这样的活动内容会动态创建它),下面的页面片段中有一个示例,显示了它的外观。页面完全加载后可能会创建文本节点5.99,并且有一些其他请求到服务器以显示额外数据的位置:

...
<td class="price-label">
    <span id="priceblock" class="price-big color">
        5.99
    </span>
</td>
...

或者如果你正在加载e。 G。出现Google搜索结果页面并等待 Next 按钮(特别是,如果您在上一页调用.click方法),或者使用登录Web表单加载某些页面并等待用户名输入字段比如<input name="userID" id="userID" type="text" maxlength="24" required="" placeholder="Username" autofocus="">

以下代码允许进一步检查目标节点是否可访问:

With objIE
    ' Navigating to some url
    .Navigate strUrl
    ' Wait for IE ready
    Do While .ReadyState <> 4 Or .Busy
        WScript.Sleep 10
    Loop
    ' Wait for document complete
    Do While .Document.ReadyState <> "complete"
        WScript.Sleep 10
    Loop
    ' Wait for target node created
    Do While TypeName(.Document.getElementById("userID")) = "Null"
        WScript.Sleep 10
    Loop
    ' Processing target node
    .Document.getElementById("userID").Value = "myusername"
    ' ...
    '
End With

答案 1 :(得分:0)

以下按元素进行的检查为我解决了:

Function waitLoadByElement(p_ElementName)

Do While IE.ReadyState <> 4 Or IE.Busy
        WScript.Sleep 1000
    Loop

    Do While IE.Document.ReadyState <> "complete"
        WScript.Sleep 1000
    Loop

       ' This is the interesting part

    Do While (instr(IE.document.getElementsByTagName("body")(0).InnerHTML,p_ElementName) < 1 )
    v_counter = v_counter + 1

        WScript.Sleep 1000
    Loop
    On Error GoTo 0

    if v_counter > 0 then
        MyEcho "[ Waited Object to Load ] : " & v_counter & " - Seconds"
    end if

End Function