我有一个大文件,其中每个项目的“ before”和“ after”大小写如下:
case1 (BEF) ACT
(AFT) BLK
case2 (BEF) ACT
(AFT) ACT
case3 (BEF) ACT
(AFT) CLC
...
我需要选择所有在“第一个”字符串上带有(BEF) ACT
并在“第二个”字符串上带有(AFT) BLK
的字符串,并将结果放入文件中。
想法是创建一个像这样的子句
IF (stringX.LineNumber consists of "(BEF) ACT" AND stringX+1.LineNumber consists of (AFT) BLK)
{OutFile $stringX+$stringX+1}
对不起,语法,我刚开始使用PS:)
$logfile = 'c:\temp\file.txt'
$matchphrase = '\(BEF\) ACT'
$linenum=Get-Content $logfile | Select-String $matchphrase | ForEach-Object {$_.LineNumber+1}
$linenum
#I've worked out how to get a line number after the line with first required phrase
创建一个新文件,其结果如下: 带有“(BEF)ACT”的字符串,后面带有“(AFT)BLK”的字符串
答案 0 :(得分:1)
Select-String -SimpleMatch -CaseSensitive '(BEF) ACT' c:\temp\file.txt -Context 0,1 |
ForEach-Object {
$lineAfter = $_.Context.PostContext[0]
if ($lineAfter.Contains('(AFT) BLK')) {
$_.Line, $lineAfter # output
}
} # | Set-Content ...
-SimpleMatch
执行字符串-字面子字符串匹配,这意味着您可以按原样传递搜索字符串,而不必对其进行转义。
但是,如果您需要进一步限制搜索,例如确保仅在行尾($
)末尾进行搜索,则确实需要使用regular expression (暗示)-Pattern
参数:'\(BEF\) ACT$'
还请注意,PowerShell默认情况下不区分大小写 ,这就是为什么使用开关-CaseSensitive
的原因。
请注意Select-String
如何直接接受文件路径-无需前面的Get-Content
调用。
-Context 0,1
捕获每次匹配前{em> 和0
行 的1
行,并将它们包括在{ Select-String
输出的{3}}个实例。
在ForEach-Object
脚本块中,$_.Context.PostContext[0]
在匹配后的 行中进行检索,.Contains()
在其中进行文字字符串搜索。>
.Contains()
是.NET System.String
类型的方法,并且与PowerShell不同,此类方法默认情况下区分大小写 ,但是您可以使用可选参数来更改它。 如果在下一行找到子字符串,则输出当前行和下一行。
上面的代码在输入文件中查找 all 个匹配对;如果只想找到 first 对,请将| Select-Object -First 2
附加到Select-String
调用中。
答案 1 :(得分:1)
另一种方法是将$ logFile作为单个字符串读取,并使用RegEx匹配项来获取所需的部分:
$logFile = 'c:\temp\file.txt'
$outFile = 'c:\temp\file2.txt'
# read the content of the logfile as a single string
$content = Get-Content -Path $logFile -Raw
$regex = [regex] '(case\d+\s+\(BEF\)\s+ACT\s+\(AFT\)\s+BLK)'
$match = $regex.Match($content)
($output = while ($match.Success) {
$match.Value
$match = $match.NextMatch()
}) | Set-Content -Path $outFile -Force
使用时的结果是:
case1 (BEF) ACT
(AFT) BLK
case7 (BEF) ACT
(AFT) BLK
正则表达式详细信息:
( Match the regular expression below and capture its match into backreference number 1 case Match the characters “case” literally \d Match a single digit 0..9 + Between one and unlimited times, as many times as possible, giving back as needed (greedy) \s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) + Between one and unlimited times, as many times as possible, giving back as needed (greedy) \( Match the character “(” literally BEF Match the characters “BEF” literally \) Match the character “)” literally \s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) + Between one and unlimited times, as many times as possible, giving back as needed (greedy) ACT Match the characters “ACT” literally \s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) + Between one and unlimited times, as many times as possible, giving back as needed (greedy) \( Match the character “(” literally AFT Match the characters “AFT” literally \) Match the character “)” literally \s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) + Between one and unlimited times, as many times as possible, giving back as needed (greedy) BLK Match the characters “BLK” literally )
答案 2 :(得分:1)
My other answer完成您自己基于Select-String
的解决方案尝试。 Select-String
用途广泛,但速度较慢,尽管它适合逐行处理文件(em)来处理太大而无法整体容纳到内存中的文件
switch -File
-请参见下面的解决方案。Theo's helpful answer首先将整个文件读取到内存中,根据文件大小,总体上可能会表现最佳,但由于严重依赖于直接使用,因此以增加复杂性为代价.NET功能。
$(
$firstLine = ''
switch -CaseSensitive -Regex -File t.txt {
'\(BEF\) ACT' { $firstLine = $_; continue }
'\(AFT\) BLK' {
# Pair found, output it.
# If you don't want to look for further pairs,
# append `; break` inside the block.
if ($firstLine) { $firstLine, $_ }
# Look for further pairs.
$firstLine = ''; continue
}
default { $firstLine = '' }
}
) # | Set-Content ...
注意:仅当您要将输出直接发送到管道到$(...)
之类的cmdlet时,才需要包含Set-Content
。不需要将其捕获到变量中:$pair = switch ...
-Regex
将分支条件解释为regular expressions。
$_
({ ... }
指的是当前行。
总体方法是:
$firstLine
存储找到的第一行,找到第二行的模式并设置$firstLine
(非空)后,将输出该对。default
处理程序重置$firstLine
,以确保仅考虑包含感兴趣字符串的两条连续行。