PowerShell从打开的选项卡中检索HTML

时间:2018-06-15 09:59:54

标签: powershell internet-explorer

我正在尝试使用PowerShell从动态JavaScript呈现的网页中检索链接。

这是我到目前为止的代码:

#  Create IE object and load URL
$ie = New-Object -comobject "InternetExplorer.Application"
$ie.visible = $true
$ie.navigate($url)

 # Wait for the page to load 

while ($ie.Busy -eq $true -Or $ie.ReadyState -ne 4) {Start-Sleep 2}

$Doc = $ie.Document
$divs = $Doc.getElementsByTagName('a')

foreach ($div in $divs){
write-host $div['id'].value
write-host $div['tagName'].value
write-host $div['parentElement'].value
write-host $div['style'].value
write-host $div['document'].value
write-host $div['sourceIndex'].value
write-host $div['offsetLeft'].value
write-host $div['offsetTop'].value
write-host $div['offsetWidth'].value
write-host $div['offsetHeight'].value
write-host $div['offsetParent'].value
write-host $div['innerHTML'].value
write-host $div['innerText'].value
write-host $div['outerHTML'].value
write-host $div['outerText'].value
write-host $div['parentTextEdit'].value
}

但是,所有输出都是空行。

(仅供参考 - 如果我只输出$div,那么我会System.__ComObject

任何人都可以解释我需要做些什么来获取信息吗?

谢谢。

1 个答案:

答案 0 :(得分:0)

首先,我会将$divs的名称更改为$tags并在foreach循环中使用$tag而不是$div,因为您并不是真的在寻找div,但<a>标签。

每个$标记都有一个getAttribute方法,您应该使用$tag['attribName'].value而不是$ie = New-Object -comobject "InternetExplorer.Application" $ie.visible = $true $ie.navigate($url) # Wait for the page to load while ($ie.Busy -eq $true -Or $ie.ReadyState -ne 4) {Start-Sleep 2} $Doc = $ie.Document $tags = $Doc.getElementsByTagName("a") foreach ($tag in $tags){ Write-Host $tag.getAttribute("id") Write-Host $tag.getAttribute("tagName") Write-Host $tag.getAttribute("parentElement") Write-Host $tag.getAttribute("style") Write-Host $tag.getAttribute("document") Write-Host $tag.getAttribute("sourceIndex") Write-Host $tag.getAttribute("offsetLeft") Write-Host $tag.getAttribute("offsetTop") Write-Host $tag.getAttribute("offsetWidth") Write-Host $tag.getAttribute("offsetHeight") Write-Host $tag.getAttribute("offsetParent") Write-Host $tag.getAttribute("innerHTML") Write-Host $tag.getAttribute("innerText") Write-Host $tag.getAttribute("outerHTML") Write-Host $tag.getAttribute("outerText") Write-Host $tag.getAttribute("parentTextEdit") }

您的代码可能看起来像这样

[System.Runtime.InteropServices.Marshal]::ReleaseComObject($ie) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

此外,完成后,您应该清理您创建的COM对象:

$Doc = $ie.Document
$tags = $Doc.getElementsByTagName("a")
#$tags = $doc.all.tags("a")

$attribs = @("id", "tagName", "parentElement", "style", "document", "sourceIndex", 
             "offsetLeft", "offsetTop", "offsetWidth", "offsetHeight", "offsetParent", 
             "innerHTML", "innerText", "outerHTML", "outerText", "parentTextEdit")

foreach ($tag in $tags){
    $attribs | ForEach-Object {
       @{ $_ = $tag.getAttribute($_) }
    }
}

修改

为了获得更好的输出我建议:

var $el = $('.resize').resizable({});
var resizable = $el.data('ui-resizable');
var old_mouse_drag = resizable._mouseDrag;

resizable._mouseDrag = function(e) {
  e.shiftKey = false;
  return old_mouse_drag.call(this, e);
};