我试图从文本文件中仅提取我的JIRA问题编号,从而消除重复。这在Shell脚本中很好:
cat /tmp/jira.txt | grep -oE '^[A-Z]+-[0-9]+' | sort -u
但我想使用Powershell并尝试过这个
$Jira_Num=Get-Content /tmp/jira.txt | Select-String -Pattern '^[A-Z]+-[0-9]+' > "$outputDir\numbers.txt"
但是,这会返回整行,也不会消除重复。我尝试了正则表达式,但我是PowerShell的新手,不知道如何使用它。请有人帮忙吗。
示例Jira.txt文件
PRJ-2303 Modified the artifactName
PRJ-2303 Modified comment
JIRA-1034 changed url to tag the prj projects
JIRA-1000 for release 1.1
JIRA-1000 Content modification
预期输出
PRJ-2303
JIRA-1034
JIRA-1000
答案 0 :(得分:2)
应该使用类似的东西:
$Jira_Num = Get-Content /tmp/jira.txt | ForEach-Object {
if ($_ -match '^([A-Z]+-[0-9]+)') {
$Matches[1]
}
} | Select-Object -Unique
Get-Content
逐行读取文件,因此我们可以将其传输到其他cmdlet以处理每一行。
ForEach-Object
为管道中的每个项运行一个命令块。所以我们在这里使用-match
运算符来执行与线路的正则表达式匹配,以及捕获组。如果匹配成功,我们将匹配的组(JIRA问题密钥)发送到管道。
Select-Object -Unique
将比较对象并仅返回唯一的对象。
答案 1 :(得分:2)
Select-String
can still work! The problem comes from the misconception of the return object. It returns a [Microsoft.PowerShell.Commands.MatchInfo]
and it would appear it ToString() equivalent is the whole matching line. I don't know what version of PowerShell you have but this should do the trick.
$Jira_Num = Get-Content /tmp/jira.txt |
Select-String -Pattern '^[A-Z]+-[0-9]+' |
Select-Object -ExpandProperty Matches |
Select-Object -ExpandProperty Value -Unique
Also you can get odd results when you are writing to an output stream and a variable at the same time. It is generally better to use Tee-Object
in cases like that.
Select-String /tmp/jira.txt -Pattern '^[A-Z]+-[0-9]+' |
Select-Object -ExpandProperty Matches |
Select-Object -ExpandProperty Value -Unique |
Tee-Object -Variable Jira_Num |
Set-Content "$outputDir\numbers.txt"
Now the file $outputDir\numbers.txt
and the variable $Jira_Num
contain the unique list. The $
not used with Tee-Object
was done on purpose.