正则表达式从最接近结尾的字符串中获取子字符串

时间:2016-11-11 03:04:08

标签: regex powershell

我正在尝试使用powershell脚本和正则表达式从字符串中获取子字符串。

例如,我试图让一年成为文件名的一部分。

示例文件名“Expo.2000.Brazilian.Pavillon.after.Something.2016.SomeTextIDontNeed.jpg” 问题是正则表达式的结果给了我“2000”而没有其他匹配。我需要让“2016”匹配。可悲的是,$ match只有一个匹配的实例。我错过了什么吗?我感到疯了;)

如果$ matches包含找到的所有实例,我可以处理最近的实例:

$Year = $matches[$matches.Count-1]

Powershell代码:

# Function to get the images year and clean up image information after it.
Function Remove-String-Behind-Year
{
    param
    (
        [string]$OriginalFileName # Provide the BaseName of the image file.
    )
    [Regex]$RegExYear = [Regex]"(?<=\.)\d{4}(?=\.|$)" Regex to match a four digit string, prepended by a dot and followed by a dot or the end of the string.
    $OriginalFileName -match $RegExYear # Matches the Original Filename with the Regex
    Write-Host "Count: " $matches.Count # Why I only get 1 result?
    Write-Host "BLA: " $matches[0] # First and only match is "2000"
}

通缉结果表:

"x.2000.y.2016.z" => "2016" (Does not work)
"x.y.2016" => "2016" (Works)
"x.y.2016.z" => "2016" (Works)
"x.y.20164.z" => "" (Works)
"x.y.201.z" => "" (Works)

1 个答案:

答案 0 :(得分:0)

  • PowerShell的-match运算符只能找到(最多)一个匹配(尽管可以通过捕获找到该一个匹配项的多个子字符串基团)。
  • 但是,使用量词* 贪婪(默认情况下)这一事实,我们仍然可以使用该匹配来查找 last 匹配输入:
    -match '^.*\.(\d{4})\b'找到输入的最长前缀,该前缀以4位数字序列结尾,前面是文字.,后跟字边界,因此$matches[1]然后包含 last 出现的这种4位数序列。
Function Extract-Year
{
  param
  (
    [string] $OriginalFileName # Provide the BaseName of the image file.
  )

  if ($OriginalFileName -match '^.*\.(\d{4})\b') {
    $matches[1] # output last 4-digit sequence found
  } else {
    '' # output empty string to indicate that no 4-digit sequence was found.
  }
}

'x.2000.y.2016.z', 'x.y.2016', 'x.y.2016.z', 'x.y.20164.z', 'x.y.201.z' | 
  % { Extract-Year $_ }

产量

2016
2016
2016
# empty line
# empty line