如何使用PowerShell将a导出到Excel电子表格

时间:2016-11-05 01:03:51

标签: excel powershell scripting

我有这个PowerShell脚本,它会删除html标记,只留下文本,并在脚本执行时显示该html文件的字数。我的问题是当我执行时:

function Html-ToText {
param([System.String] $html)

# remove line breaks, replace with spaces
$html = $html -replace "(`r|`n|`t)", " "
# write-verbose "removed line breaks: `n`n$html`n"

# remove invisible content
@('head', 'style', 'script', 'object', 'embed', 'applet', 'noframes', 'noscript', 'noembed') | % {
$html = $html -replace "<$_[^>]*?>.*?</$_>", ""
}
# write-verbose "removed invisible blocks: `n`n$html`n"

# Condense extra whitespace
$html = $html -replace "( )+", " "
# write-verbose "condensed whitespace: `n`n$html`n"

# Add line breaks
@('div','p','blockquote','h[1-9]') | % { $html = $html -replace "</?$_[^>]*?>.*?</$_>", ("`n" + '$0' )} 
# Add line breaks for self-closing tags
@('div','p','blockquote','h[1-9]','br') | % { $html = $html -replace "<$_[^>]*?/>", ('$0' + "`n")} 
# write-verbose "added line breaks: `n`n$html`n"

#strip tags 
$html = $html -replace "<[^>]*?>", ""
# write-verbose "removed tags: `n`n$html`n"

# replace common entities
@( 
@("&amp;bull;", " * "),
@("&amp;lsaquo;", "<"),
@("&amp;rsaquo;", ">"),
@("&amp;(rsquo|lsquo);", "'"),
@("&amp;(quot|ldquo|rdquo);", '"'),
@("&amp;trade;", "(tm)"),
@("&amp;frasl;", "/"),
@("&amp;(quot|#34|#034|#x22);", '"'),
@('&amp;(amp|#38|#038|#x26);', "&amp;"),
@("&amp;(lt|#60|#060|#x3c);", "<"),
@("&amp;(gt|#62|#062|#x3e);", ">"),
@('&amp;(copy|#169);', "(c)"),
@("&amp;(reg|#174);", "(r)"),
@("&amp;nbsp;", " "),
@("&amp;(.{2,6});", "")
) | % { $html = $html -replace $_[0], $_[1] }
# write-verbose "replaced entities: `n`n$html`n"

return $html + $a | Measure-Object -word
}

然后运行:

Html-ToText(new-object net.webclient).DownloadString(“test.html”)

它显示PowerShell输出中显示的 4个字。如何将PowerShell窗口中的输出导出到包含单词列和计数 4 的Excel电子表格中?

2 个答案:

答案 0 :(得分:0)

您想要的CSV看起来像这样:

Words
4

将它写入文本文件很容易,Excel会读取它。但是你很幸运,Measure-Object的输出已经是一个对象,其中'Words'作为属性,'4'作为一个值,你可以直接将其输入Export-Csv。使用select-object选择您想要的属性:

$x = Html-ToText (new-object net.webclient).DownloadString("test.html")

# drop the Lines/Characters/etc fields, just export words
$x | select-Object Words | Export-Csv out.csv -NoTypeInformation

我很想知道我是否可以使用

$x = Invoke-WebResponse http://www.google.com
$x.AllElements.InnerText

在我尝试使用替换删除内容之前,从HTML中获取单词。

答案 1 :(得分:0)

我明白了。我做了什么补充说     + $ a | Measure-Object -Word在脚本中的#html变量之后然后运行:     Html-ToText(new-object net.webclient).DownloadString(&#34; test.html&#34;)+ select-Object Words | Export-Csv out.csv -NoTypeInformation并导出单词count - josh s 1分钟前