使用正则表达式提取(重复)包含括号的组

时间:2018-12-04 17:28:29

标签: regex powershell

我的字符串:

(01) this is value one (02) and this is 2 (03) and this is number 3

所需结果(键/值对):

(01)    this is value one  
(02)    and this is 2   
(03)    and this is number 3

到目前为止,我的代码:

$s="(01) this is value one (02) and this is 2 (03) and this is number 3" 
$pattern  = '(\(\d\d\))(.*)' 
$m = $s | select-string $pattern -AllMatches | % {$_.matches} | ForEach-Object { $_.Groups[1].Value }

如何做到这一点?

4 个答案:

答案 0 :(得分:1)

我能够通过以下方式达到您想要的输出:

PS H:\> $pattern = '(\(\d\d\))([^(]*)'
PS H:\> $results = $s | Select-String $pattern -AllMatches
PS H:\> $results.Matches.Value
(01) this is value one
(02) and this is 2
(03) and this is number 3

编辑:访问匹配组:

PS H:\> $results.Matches.Captures.Groups[0].value
(01) this is value one
PS H:\> $results.Matches.Captures.Groups[1].value
(01)
PS H:\> $results.Matches.Captures.Groups[2].value
 this is value one
PS H:\> $results.Matches.Captures.Groups[3].value
(02) and this is 2
PS H:\> $results.Matches.Captures.Groups[4].value
(02)
PS H:\> $results.Matches.Captures.Groups[5].value
 and this is 2

答案 1 :(得分:1)

这是使用字符串方法而不是正则表达式的替代方法。它还将输出存储在有序哈希表中。 [ordered]只是为了方便-我希望显示是按顺序的,以便我可以确认输出是预期的。

重新将“空白项目”过滤器改写为使用Where-Object而不是.Where(),因为OP在PoSh的v4之前的版本上。

# fake reading in a text file
#    in real life, use Get-Content
$InStuff = @'
(01) this is value one (02) and this is 2 (03) and this is number 3
(01) One Bravo (03) Three Bravo
(02) Two Charlie
(111) OneThrice Delta (666) Santa Delta
(01) One Echo (03) Three Echo (05) Five Echo
'@ -split [environment]::NewLine

$LookupTable = [ordered]@{}

foreach ($IS_Item in $InStuff)
    {
    # OP cannot use the ".Where()" array method - that was added in ps4
    #foreach ($Split_Item in $IS_Item.Split('(').Where({$_}))
    $Split_ISI = $IS_Item.Split('(') |
        # this gets rid of the empty items
        Where-Object {$_}

    foreach ($SI_Item in $Split_ISI)
        {
        $Key = $SI_Item.Split(')')[0].Trim()
        $Value = $SI_Item.Split(')')[1].Trim()
        # the leading comma forces the input to be an array
        $LookupTable[$Key] += ,$Value
        }
    }

$LookupTable | Out-Host

$LookupTable['01'][0] | Out-Host
$LookupTable['02'][1] | Out-Host

输出...

Name                           Value
----                           -----
01                             {this is value one, One Bravo, One Echo}
02                             {and this is 2, Two Charlie}
03                             {and this is number 3, Three Bravo, Three Echo}
111                            {OneThrice Delta}
666                            {Santa Delta}
05                             {Five Echo}


this is value one
Two Charlie

这里的主要问题是查找关键字必须是字符串,因此直接引用必须用数字引号-'01'而不是01

答案 2 :(得分:1)

由于您正在寻找键-值对 ,因此以(n个有序)哈希表收集它们是很有意义的>

可以通过基于正则表达式的 -split运算符执行拆分,该操作符还可以通过以下方式在输出数组中包括与 eparator regex匹配的部分内容:捕获组((...))。

# Input string
$s = '(01) this is value one (02) and this is 2 (03) and this is number 3'

# Initialize the output hashtable
$ht = [ordered] @{}

# Split the input string and fill the hashtable.
$i = 0; 
$s -split '(\(\d+\)) ' -ne '' | ForEach-Object { 
  if (++$i % 2) { $key = $_ } else { $ht[$key] = $_ }
}

# Output the hashtable
$ht

以上结果:

Name                           Value
----                           -----
(01)                           this is value one 
(02)                           and this is 2 
(03)                           and this is number 3

注意:如果不想在关键字(名称)属性中包含(...),请使用
-split '\((\d+)\) '代替-split '(\(\d+\)) '

上面的方法将字符串拆分为数组的元素,其中相邻元素对代表键值对。然后,ForEach-Object调用将这些键值对添加到输出哈希表,并根据元素索引是奇数还是偶数来确定输入元素是键还是值。


关于您尝试过的事情

您的正则表达式'(\(\d\d\))(.*)'太贪婪,这意味着由于.*,给定行上的单个匹配项将匹配 entire 行子表达式。

如果使用以下正则表达式,则会获得所需的匹配项:
'(\(\d+\)) ([^(]+)'

也就是说,在匹配诸如(01)之类的索引之后,仅匹配不包括随后的((如果有)的索引。

在原始命令的精简版本中,该命令将键值对输出为自定义对象的数组 [pscustomobject]个实例):

$s = '(01) this is value one (02) and this is 2 (03) and this is number 3'
$pattern  = '(\(\d+\)) ([^(]+)'
$s | Select-String $pattern -AllMatches | ForEach-Object {
  $_.matches | Select-Object @{ n='Name';  e = { $_.Groups[1].Value } },
                             @{ n='Value'; e = { $_.Groups[2].Value } }
}

以上结果:

Name Value
---- -----
(01) this is value one 
(02) and this is 2 
(03) and this is number 3

但是请注意,上面的代码会输出一个自定义对象的数组 ,每个对象代表一个键值对,这与顶部的解决方案有所不同,会创建一个包含所有键值对的哈希表。

答案 3 :(得分:0)

(xx)文字后面加上4个空格

$s="(01) this is value one (02) and this is 2 (03) and this is number 3"
$s -replace " (?=\(\d\d\))","`n" -replace "(?<=\(\d\d\)) +","   "

示例输出:

(01)    this is value one
(02)    and this is 2
(03)    and this is number 3

上述RegEx使用零长度的环视

  • 第一个以CR替换开头的空格
  • 第二个替换正好为4的任意数量的尾随空格。