Powershell找到具体的模式

时间:2015-07-31 19:10:30

标签: regex powershell jira

我试图从文本文件中仅提取我的JIRA问题编号,从而消除重复。这在Shell脚本中很好:

 cat /tmp/jira.txt | grep -oE '^[A-Z]+-[0-9]+' | sort -u

但我想使用Powershell并尝试过这个

$Jira_Num=Get-Content /tmp/jira.txt |  Select-String -Pattern '^[A-Z]+-[0-9]+' > "$outputDir\numbers.txt"

但是,这会返回整行,也不会消除重复。我尝试了正则表达式,但我是PowerShell的新手,不知道如何使用它。请有人帮忙吗。

示例Jira.txt文件

 PRJ-2303 Modified the artifactName
 PRJ-2303 Modified comment
 JIRA-1034 changed url to tag the prj projects
 JIRA-1000 for release 1.1
 JIRA-1000 Content modification

预期输出

 PRJ-2303
 JIRA-1034
 JIRA-1000

2 个答案:

答案 0 :(得分:2)

应该使用类似的东西:

$Jira_Num = Get-Content /tmp/jira.txt | ForEach-Object { 
    if ($_ -match '^([A-Z]+-[0-9]+)') {
        $Matches[1]
    }
} | Select-Object -Unique

Get-Content逐行读取文件,因此我们可以将其传输到其他cmdlet以处理每一行。

ForEach-Object为管道中的每个项运行一个命令块。所以我们在这里使用-match运算符来执行与线路的正则表达式匹配,以及捕获组。如果匹配成功,我们将匹配的组(JIRA问题密钥)发送到管道。

Select-Object -Unique将比较对象并仅返回唯一的对象。

答案 1 :(得分:2)

Select-String can still work! The problem comes from the misconception of the return object. It returns a [Microsoft.PowerShell.Commands.MatchInfo] and it would appear it ToString() equivalent is the whole matching line. I don't know what version of PowerShell you have but this should do the trick.

$Jira_Num = Get-Content /tmp/jira.txt | 
    Select-String  -Pattern '^[A-Z]+-[0-9]+' | 
    Select-Object -ExpandProperty Matches | 
    Select-Object -ExpandProperty Value -Unique

Also you can get odd results when you are writing to an output stream and a variable at the same time. It is generally better to use Tee-Object in cases like that.

Select-String /tmp/jira.txt -Pattern '^[A-Z]+-[0-9]+' | 
    Select-Object -ExpandProperty Matches | 
    Select-Object -ExpandProperty Value -Unique | 
    Tee-Object -Variable Jira_Num | 
    Set-Content "$outputDir\numbers.txt"

Now the file $outputDir\numbers.txt and the variable $Jira_Num contain the unique list. The $ not used with Tee-Object was done on purpose.