我有一堆XML文件,我想检测并删除其中的空标签。像:
<My></My>
<Your/>
<sometags>
<his>
</his>
<hasContent>sdfaf</hasContent>
</sometags>
他们希望删除所有类型的空标记(My
,Your
,his
)。 PowerShell是否支持这种空标记检测,无论它们嵌入其他标记内有多深?
答案 0 :(得分:6)
function Format-XML
{
param (
[parameter(Mandatory = $true)][xml] $xml,
[parameter(Mandatory = $false)][int] $indent = 4
)
try
{
$Error.Clear()
$StringWriter = New-Object System.IO.StringWriter
$XmlWriter = New-Object System.XMl.XmlTextWriter $StringWriter
$xmlWriter.Formatting = "indented"
$xmlWriter.Indentation = $indent
$xml.WriteContentTo($XmlWriter)
$XmlWriter.Flush()
$StringWriter.Flush()
return $StringWriter.ToString()
}
catch
{
Write-Host "$($MyInvocation.InvocationName): $_"; return $null
}
}
$xml = [xml] @"
<document>
<My></My>
<Your/>
<sometags>
<his>
</his>
<hasContent>sdfaf</hasContent>
</sometags>
</document>
"@
# The "magic" part is in this XPath expression
$nodes = $xml.SelectNodes("//*[count(@*) = 0 and count(child::*) = 0 and not(string-length(text())) > 0]")
$nodes | %{
$_.ParentNode.RemoveChild($_)
}
Format-Xml $xml
答案 1 :(得分:1)
我在PowerShell中不流利,所以只有@DavidBrabant的好回答,特别是在xpath部分。用于检测空元素的xpath可以更简单一些:
//*[not(@*) and not(*) and normalize-space()]
谓词([]
内的所有内容)按顺序检查当前元素是否没有属性,没有子元素,并且没有空文本节点。
答案 2 :(得分:0)
您应该寻找使用System.Xml.XmlDocument的解决方案。但它也可以使用正则表达式:
$xml = @"
<document>
<My></My>
<Your/>
<sometags>
<his>
</his>
<hasContent>sdfaf</hasContent>
</sometags>
</document>
"@
$xml -replace '(?:<(\w*)>\s*<\/\1>)|<(\w*)\/>', ''