使用powershell读取html内容

时间:2014-02-26 06:40:17

标签: html powershell csv powershell-v2.0

很抱歉对PowerShell的了解有限。在这里,我尝试从网站上读取html内容,并输出为csv文件。现在我可以使用我的powershell脚本成功下载整个html代码:

$url = "http://cloudmonitor.ca.com/en/ping.php?vtt=1392966369&varghost=www.yahoo.com&vhost=_&vaction=ping&ping=start";
$Path = "$env:userprofile\Desktop\test.txt"

$ie = New-Object -com InternetExplorer.Application 
$ie.visible = $true
$ie.navigate($url)

while($ie.ReadyState -ne 4) { start-sleep -s 10 }

#$ie.Document.Body.InnerText | Out-File -FilePath $Path
$ie.Document.Body | Out-File -FilePath $Path
$ie.Quit()

获取HTML代码,如下所示:

  ........
  <tr class="light-grey-bg">
  <td class="right-dotted-border">Stockholm, Sweden (sesto01):</td>
  <td class="right-dotted-border"><span id="cp20">Okay</span>
  </td>
  <td class="right-dotted-border"><span id="minrtt20">21.8</span>
  </td>
  <td class="right-dotted-border"><span id="avgrtt20">21.8</span>
  </td>
  <td class="right-dotted-border"><span id="maxrtt20">21.9</span>
  </td>
  <td><span id="ip20">2a00:1288:f00e:1fe::3001</span>
  </td>
  </tr>
  ........

但我真正想要的是将内容输出到csv文件,如下所示:

Stockholm Sweden (sesto01),Okay,21.8,21.8,21.9,2a00:1288:f00e:1fe::3001
........

什么命令可以帮助我完成这项任务?

1 个答案:

答案 0 :(得分:1)

感谢CA网站,这对我来说也很有趣。我在桌子的一角写下了这个,它需要改进。

以下是使用Html-Agility-Pack的方法,在下文中,我假设HtmlAgilityPack.dll位于目录脚本文件的 Html-Agility-Pack 目录中。

# PingFromTheCloud.ps1

$url = "http://cloudmonitor.ca.com/en/ping.php?vtt=1392966369&varghost=www.silogix.fr&vhost=_&vaction=ping&ping=start";
$Path = "c:\temp\Pingtest.htm"

$ie = New-Object -com InternetExplorer.Application 
$ie.visible = $true
$ie.navigate($url)

while($ie.ReadyState -ne 4) { start-sleep -s 10 }

#$ie.Document.Body.InnerText | Out-File -FilePath $Path
$ie.Document.Body | Out-File -FilePath $Path
$ie.Quit()

Add-Type -Path "$(Split-Path -parent $PSCommandPath)\Html-Agility-Pack\HtmlAgilityPack.dll"


$webGraber = New-Object -TypeName HtmlAgilityPack.HtmlWeb
$webDoc = $webGraber.Load("c:\temp\Pingtest.htm")
$Thetable = $webDoc.DocumentNode.ChildNodes.Descendants('table') | where {$_.XPath -eq '/div[3]/div[1]/div[5]/table[1]/table[1]'}

$trDatas = $Thetable.ChildNodes.Elements("tr")

Remove-Item "c:\temp\Pingtest.csv"

foreach ($trData in $trDatas)
{
  $tdDatas = $trData.elements("td")
  $line = ""
  foreach ($tdData in $tdDatas)
  {
    $line = $line + $tdData.InnerText.Trim() + ','
  }
  $line.Remove($line.Length -1) | Out-File -FilePath "c:\temp\Pingtest.csv" -Append
}