如何比较文件中的两个顺序字符串

时间:2019-04-16 12:01:45

标签: powershell

我有一个大文件,其中每个项目的“ before”和“ after”大小写如下:

case1 (BEF) ACT
      (AFT) BLK
case2 (BEF) ACT
      (AFT) ACT
case3 (BEF) ACT
      (AFT) CLC
...

我需要选择所有在“第一个”字符串上带有(BEF) ACT并在“第二个”字符串上带有(AFT) BLK的字符串,并将结果放入文件中。

想法是创建一个像这样的子句

IF (stringX.LineNumber consists of "(BEF) ACT" AND stringX+1.LineNumber consists of (AFT) BLK)
{OutFile $stringX+$stringX+1}

对不起,语法,我刚开始使用PS:)

$logfile = 'c:\temp\file.txt'
$matchphrase = '\(BEF\) ACT'
$linenum=Get-Content $logfile | Select-String $matchphrase | ForEach-Object {$_.LineNumber+1}
$linenum 
#I've worked out how to get a line number after the line with first required phrase

创建一个新文件,其结果如下: 带有“(BEF)ACT”的字符串,后面带有“(AFT)BLK”的字符串

3 个答案:

答案 0 :(得分:1)

Select-String -SimpleMatch -CaseSensitive '(BEF) ACT' c:\temp\file.txt -Context 0,1 |
  ForEach-Object {
    $lineAfter = $_.Context.PostContext[0]
    if ($lineAfter.Contains('(AFT) BLK')) {
      $_.Line, $lineAfter  # output
    }
  } # | Set-Content ...
  • -SimpleMatch执行字符串-字面子字符串匹配,这意味着您可以按原样传递搜索字符串,而不必对其进行转义。

    • 但是,如果您需要进一步限制搜索,例如确保仅在行尾($)末尾进行搜索,则确实需要使用regular expression (暗示)-Pattern参数:'\(BEF\) ACT$'

    • 还请注意,PowerShell默认情况下不区分大小写 ,这就是为什么使用开关-CaseSensitive的原因。

  • 请注意Select-String如何直接接受文件路径-无需前面的Get-Content调用。

  • -Context 0,1捕获每次匹配前{em> 和0 1行,并将它们包括在{ Select-String输出的{3}}个实例。

  • ForEach-Object脚本块中,$_.Context.PostContext[0]在匹配后的 行中进行检索,.Contains()在其中进行文字字符串搜索。

    • 请注意,.Contains()是.NET System.String类型的方法,并且与PowerShell不同,此类方法默认情况下区分大小写 ,但是您可以使用可选参数来更改它。
  • 如果在下一行找到子字符串,则输出当前行和下一行。

  • 上面的代码在输入文件中查找 all 个匹配对;如果只想找到 first 对,请将| Select-Object -First 2附加到Select-String调用中。

答案 1 :(得分:1)

另一种方法是将$ logFile作为单个字符串读取,并使用RegEx匹配项来获取所需的部分:

$logFile = 'c:\temp\file.txt'
$outFile = 'c:\temp\file2.txt'

# read the content of the logfile as a single string
$content = Get-Content -Path $logFile -Raw

$regex = [regex] '(case\d+\s+\(BEF\)\s+ACT\s+\(AFT\)\s+BLK)'
$match = $regex.Match($content)
($output = while ($match.Success) {
    $match.Value
    $match = $match.NextMatch()
}) | Set-Content -Path $outFile -Force

使用时的结果是:

case1 (BEF) ACT
      (AFT) BLK
case7 (BEF) ACT
      (AFT) BLK

正则表达式详细信息:

(              Match the regular expression below and capture its match into backreference number 1
   case        Match the characters “case” literally
   \d          Match a single digit 0..9
      +        Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \s          Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
      +        Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \(          Match the character “(” literally
   BEF         Match the characters “BEF” literally
   \)          Match the character “)” literally
   \s          Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
      +        Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   ACT         Match the characters “ACT” literally
   \s          Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
      +        Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \(          Match the character “(” literally
   AFT         Match the characters “AFT” literally
   \)          Match the character “)” literally
   \s          Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
      +        Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   BLK         Match the characters “BLK” literally
)

答案 2 :(得分:1)

  • My other answer完成您自己基于Select-String的解决方案尝试。 Select-String用途广泛,但速度较慢,尽管它适合逐行处理文件(em)来处理太大而无法整体容纳到内存中的文件

    • 不过,PowerShell提供了更快的 逐行处理替代方法:
      switch -File
      -请参见下面的解决方案。
  • Theo's helpful answer首先将整个文件读取到内存中,根据文件大小,总体上可能会表现最佳,但由于严重依赖于直接使用,因此以增加复杂性为代价.NET功能。


$(
  $firstLine = ''
  switch -CaseSensitive -Regex -File t.txt {
    '\(BEF\) ACT' { $firstLine = $_; continue }
    '\(AFT\) BLK' { 
      # Pair found, output it.
      # If you don't want to look for further pairs, 
      # append `; break` inside the block.
      if ($firstLine) { $firstLine, $_ }
      # Look for further pairs.
      $firstLine = ''; continue
    }
    default { $firstLine = '' }
  } 
) # | Set-Content ...

注意:仅当您要将输出直接发送到管道到$(...)之类的cmdlet时,才需要包含Set-Content。不需要将其捕获到变量中:$pair = switch ...

  • -Regex将分支条件解释为regular expressions

  • 分支的动作脚本块中的
  • $_{ ... }指的是当前行。

  • 总体方法是:

    • $firstLine存储找到的第一行,找到第二行的模式并设置$firstLine(非空)后,将输出该对。
    • default处理程序重置$firstLine,以确保仅考虑包含感兴趣字符串的两条连续行。