点击后访问下一个网页

时间:2015-01-11 12:00:26

标签: html regex html5 powershell dom

要求:点击下面$ ie.Navigate中指定的网页。我需要访问下一个打开的Web页面的HTML / OuterHTML源。

例如:当我打开https://www.healthkartplus.com/search/all?name=Sporanox时(通过设置$ control = Sporanox),下面的代码只需点击第一个匹配的链接即可。点击链接后,我需要访问结果页面的HTML。

更新:提到另一个SO问题并了解到我们可以搜索适当的窗口。代码似乎适用于某些场景,但并非适用于所有场景。对于$ ie2,我在访问Document属性时遇到问题。

function getStringMatch
 {
    # Loop through all 2 digit combinations in the $path directory
    foreach ($control In $controls)
    {
        $ie = New-Object -COMObject InternetExplorer.Application
        $ie.visible = $true
        $site = $ie.Navigate("https://www.healthkartplus.com/search/all?name=$control")
        $ie.ReadyState

        while ($ie.Busy -and $ie.ReadyState -ne 4){ sleep -Milliseconds 100 }

        $link = $null
        $link = $ie.Document.get_links() | where-object {$_.innerText -eq "$control"}
        $link.click()

        while ($ie.Busy -and $ie.ReadyState -ne 4){ sleep -Milliseconds 100 }

       $ie2 = (New-Object -COM 'Shell.Application').Windows() | ? {
       $_.Name -eq 'Windows Internet Explorer' -and $_.LocationName -match "^$control"
       }

        # NEED outerHTML of new page. CURRENTLY it is working for some.

        $ie.Document.body.outerHTML > d:\med$control.txt
    }
}

$controls = "Sporanox"

getStringMatch

1 个答案:

答案 0 :(得分:1)

我认为问题在于您在第一页中查找链接。 链接innerText不等于$ control,它包含$ control,即innerText是" Sporanox(100mg)"。

以下内容可能有所帮助:

$link = $ie.Document.get_links() | where-object {if ($_.innerText){$_.innerText.contains($control)}}

修改

以下是我使用的完整代码:

function getStringMatch
{
    # Loop through all 2 digit combinations in the $path directory
    foreach ($control In $controls)  
    {
        $ie = New-Object -COMObject InternetExplorer.Application
        $ie.visible = $true
        $site = $ie.Navigate("https://www.healthkartplus.com/search/all?name=$control")
        $ie.ReadyState

        while ($ie.Busy -and $ie.ReadyState -ne 4){ sleep -Milliseconds 100 }   

        $link = $null
        $link = $ie.Document.get_links() | where-object {if ($_.innerText){$_.innerText.contains($control)}}
        $link.click()

        while ($ie.Busy)
        { 
            sleep -Milliseconds 100 
        }

        # NEED outerHTML of new page. CURRENTLY it is working for some.

        $ie.Document.body.outerHTML > d:\med$control.txt
    }
}

$controls = "Sporanox"

getStringMatch