这就是我在PowerShell中所做的:
PS > $source = "http://www.bing.com/search?q=sqrt(2)"
PS > $result = Invoke-WebRequest $source
PS > $resultContainer = $result.ParsedHtml.GetElementById("results_container")
这是我收到的错误消息:
The property 'ParsedHtml' cannot be found on this object. Verify that the property exists. At line:1 char:1 + $resultContainer = $result.ParsedHtml.GetElementById("results_contain ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], PropertyNotFoundException
+ FullyQualifiedErrorId : PropertyNotFoundStrict
答案 0 :(得分:4)
我不相信你可以在非Windows平台上使用PowerShell做到这一点(至少现在还没有)。要解析HTML内容,PowerShell使用MSHTML.DLL和/或Windows外部不存在的其他Internet Explorer / Edge组件。请注意,GetElementById just proxies to the COM object并且您的环境中没有COM对象。
您可以检查RawContent
返回的对象的Invoke-WebRequest
属性并自己解析该字符串以查找所需的内容,但使用正则表达式解析HTML是不可取的,所以你'我必须使用其他方法。
顺便说一句,我无法找到您在示例中使用的id
results_container
{{}}}元素。
答案 1 :(得分:0)
有效的方法(但有点混乱)是在Powershell中将AngleSharp用作.Net程序集。 Powershell github issue中也建议使用。
[string]$html = "<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content=""initial-scale=1, minimum-scale=1, width=device-width"">
<title>Error 404 (Not Found)!!1</title>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>404.</b> <ins>That’s an error.</ins>
<p>The requested URL <code>/error</code> was not found on this server. <ins>That’s all we know.</ins>";
#Loads assembly for angle sharp: https://stackoverflow.com/questions/39257572/loading-assemblies-from-nuget-packages
#WARNING: probably in a non-portable way.
$standardAssemblyFullPath = (Get-ChildItem -Filter *.dll -Recurse (Split-Path (get-package AngleSharp).Source)).FullName | Where-Object {$_ -like "*standard*"}
Add-Type -Path $standardAssemblyFullPath
$parser = New-Object AngleSharp.Parser.Html.HtmlParser
$document = $parser.Parse($html);
$elements = $document.All | Where-Object {$_.id -eq "logo"};
Write-Host $elements.OuterHtml