Powershell RegEx-捕获“太多”(不遵守非贪婪指标吗?)

时间:2018-08-10 20:45:32

标签: regex xml powershell

下面的代码正在返回:

partner=<Partner>
 more stuff <Name>Test</Name>
 other things </Partner>  <Partner>
 more stuff <Name>CompanyX</Name>
 other things </Partner> 

但我希望它返回:

partner=<Partner>
 more stuff <Name>CompanyX</Name>
 other things </Partner> 

示例代码:

$partyName = "CompanyX" 

#$bindings = [IO.File]::ReadAllText($inputFileName)

$bindings = "starting stuff <Partner>`r`n more stuff <Name>Test</Name>`n other things </Partner>  <Partner>`r`n more stuff <Name>CompanyX</Name>`n other things </Partner> ending stuff" 


$found = $bindings -match "(?s)(<Partner>.*?<Name>$partyName</Name>.*?</Partner>)"

if ($found) 
{
    Write-Host "matched"
    $partner = $matches[1]
}

Write-Host "partner=$partner "

2 个答案:

答案 0 :(得分:3)

如TheIncorrigible1所说:使用xml解析器代替正则表达式。

但是。.由于使用正则表达式为您这样做的原因可能只是看到IF和如何使用正则表达式来完成,因此可以使用:

$found = $bindings -match "(?sx)(<Partner>(?:((?!</Partner>).)+<Name>$([Regex]::Escape($partyName))</Name>)(?:((?!</Partner>).))*</Partner>)"

答案 1 :(得分:0)

非贪婪的重复符号(.*? 受尊敬,但在这种情况下,它们还不够

<Partner>.*?<Name>$partyName</Name><Partner><Name>元素的 next 实例之间匹配,但这不能保证不会再有{{ 1}}标签之间。
换句话说:您的正则表达式将始终在 first <Partner>标签和感兴趣的<Partner>元素之间匹配。

为防止这种情况,您需要一个否定的look-ahead assertion<Name>),以排除干预 (?!...)标签

<Partner>

以上结果:

# Sample input, defined as a here-string.
$bindings = @'
starting stuff <Partner>
more stuff <Name>Test</Name>
 other things </Partner> <Partner>
 stuff of interest before <Name>CompanyX</Name>
 stuff of interest after </Partner> even more </Partner> ending stuff
'@ 

# Escape the name to ensure it is treated as a literal inside the regex.
# Note: Not strictly necessary for sample value 'CompanyX'
$partyName = [regex]::Escape('CompanyX')

# Use a negative look-ahead assertion - (?!...) - to rule out intervening
# <Partner> tags before the <Name> element of interest.
if ($bindings -match "(?s)<Partner>((?!<Partner>).)*<Name>$partyName</Name>.*?</Partner>") {
  # Output the match.
  $matches[0]
} else { 
  Write-Warning 'No match.'
}
  • <Partner> stuff of interest before <Name>CompanyX</Name> stuff of interest after </Partner> 匹配单个字符((?!<Partner>). not ,其后跟字符串.

  • 此子表达式本身必须与开始的<Partner>和感兴趣的<Partner>元素之间的每个字符(如果有)匹配,因此将其包装在<Name>

    • 我认为这会导致效率低下的匹配算法,但它确实有效。
      如前所述,值得考虑使用对XPath查询进行适当的XML解析作为替代方案。

    • 您可以通过使用(...)*作为包装器来提高匹配效率,该包装器告诉正则表达式引擎不要捕获(最新的)子表达式匹配项。 ((?:...)*捕获组,这意味着子表达式匹配的内容是作为自动变量(...)返回的一部分报告的,此处不需要,因此$Matches抑制了这一点。)