Powershell Regex:读取两点之间的多行字符串

时间:2015-01-14 12:19:24

标签: regex powershell

我有一个足够普遍的问题,一个PowerShell正则表达式来读取多行记录。我已经阅读过提出类似问题的主题,但在我的案例中却无法解决问题。

我的文件包含可变长度的多行记录。我感兴趣的记录以01或02开头,后跟V或M.记录在另一个记录开始时或者以50' 50开头的批记录结束时结束。找到了。每行的前三个字符标识记录。

即 01V(记录开始 - 内容如下) 01

我试图通过识别开始和结束来读取带有正则表达式的单个记录。

我现在所拥有的是基于这个答案: Match everything between two words in Powershell

#Read the file as a single string
$FilePath = "blaablaablaa"
$TestFile = get-content $FilePath | Out-String 

#( ?= Assert that this matches before the current position
# 0(1|2)(V|M) if the record is 01V or 01M or 02V or 02M 
# ) End assertion 
# .+? Match any number of characters, few as possible
# (?= Until a record starting with 70 is found  
# ) End of look ahead
$regex = [regex] '(?is)(?<=0(1|2)(V|M)).+?(?=70)'
echo $TestFile |  select-string -Pattern $regex 

如果我将管道移除到out-sting并使用out-string管道返回整个文件,则上面将使用单行字符串。我猜测我没有正确处理/ n字符。

有什么建议吗?输入文件大致如下:

  

00日期
  01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01   01   01   01 = 0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 = 5xxxxxxxxxxxxxxxxxxxxxxxxxxx
  01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01   01   01   01 = 0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 = 9xxxxxxxxxxxxxxxxxxxxxxxxxxx
  50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
  01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 $ 1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 $ A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 $ B 0xxxxxxxxxxxxxxxxxxxx
  01 $ 0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 $ 5xxxxxxxxxxxxxxxxxxxxxxxxxxx
  50 xxxxxxxxxxxx BatchTotal
  90 xxxxxxxxxxxx FILETotal

所需的输出是将文件拆分为单个记录,这些记录由&#39; 50&#39;分隔。或者&#39; 90&#39;或另一条记录的开头。例如,这是最终记录: -

  

01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 $ 1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 $ A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 $ B 0xxxxxxxxxxxxxxxxxxxx
  01 $ 0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  01 $ 5xxxxxxxxxxxxxxxxxxxxxxxxxxx

2 个答案:

答案 0 :(得分:1)

假设(根据您的说明),您还希望匹配01M中的部分,直到下一个01M,然后分别与50匹配。(?gmis)^0[12][VM](?:[^\n]|\n(?!0[12][VM]|50|90))+ 。这样就可以了:

(?:...)

说明:匹配0,1 2,V M后,[^\n]|\n(?!0[12][VM]|50|90) 中的部分为:

(?!...)

这意味着:

匹配新行的任何字符

新记录 50 90的新行未遵循 {{1}}

<强> online Regex101 demo

答案 1 :(得分:0)

使用您的测试数据:

@'
00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal
'@ | set-content testfile.txt


$Text = Get-Content ./testfile.txt -Raw

$regex = @'
(?ms)(01(?:M|V).+?)
(?:5|9)0.+?
'@


$Records = 
[regex]::Matches($Text,$regex) |
foreach {$_.groups[1].value}

$Records[-1]

01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx