Question

我有一个文件input.txt，其中包含以下文字：

GRP123456789
    123456789012
GRP234567890
    234567890123
GRP456789012
    "A lot of text. More text. Blah blah blah: Foobar." (Source Error) (Blah blah blah)
GRP567890123
    Source Error
GRP678901234
    Source Error
GRP789012345
    345678901234
    456789012345

我试图抓住所有出现的＆＃34; GRP #########＆＃34;条件是下一行至少有一个数字。

所以GRP123456789是有效的，但GRP456789012和GRP678901234不是。

我在http://regexstorm.net/tester上提出的RegEx模式是：(GRP[0-9]{9})\s\n\s+[0-9]

到目前为止，基于此网站http://techtalk.gfi.com/windows-powershell-extracting-strings-using-regular-expressions/的PowerShell脚本是：

$input_path = 'C:\Users\rtaite\Desktop\input.txt'
$output_file = 'C:\Users\rtaite\Desktop\output.txt'

$regex = '(GRP[0-9]{9})\s\n\s+[0-9]'

select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Values } > $output_file

我没有得到任何输出，我不知道为什么。

任何对此的帮助都会受到赞赏，因为我只是想更好地理解这一点。

Answer 1

您需要将文本输入转换为单个字符串，然后再将其传递给Select-String，否则cmdlet将分别对每一行进行操作，从而找不到匹配项。

Get-Content $input_path | Out-String |
    Select-String $regex -AllMatches |
    Select-Object -Expand Matches |
    ForEach-Object { $_.Groups[1].Value } |
    Set-Content $output_file

如果您使用的是PowerShell v3或更新版本，则可以将Get-Content | Out-String替换为Get-Content -Raw。

Answer 2

要使用模式从文本文件中删除字符串，那么作业的最佳工具是Select-String。这也有一个名为-Context的参数，它允许您在匹配的行之前或之后捕获行，非常适合此问题。

所以我的解决方案是这样的：

Select-String 'input.txt' -Pattern '^GRP[0-9]{9}' -Context 0, 1 | ? {
    $_.Context.PostContext -match '\d'
} | Select -ExpandProperty line | Set-Content 'output_file.txt'

Answer 3

使用

[regex]::Matches($(Get-Content '.\Desktop\new 1.txt'), "GRP\d+(?=\s+\d)") |
    % { $_.value | Out-File .\Desktop\new-1-matches.txt -Append }

我从您的示例文件中获得了以下输出：

GRP123456789
GRP234567890
GRP789012345

Print Powershell Regex捕获到输出文件

3 个答案: