如何在Powershell中从$ pattern中删除多行文本块

时间:2019-04-05 14:58:49

标签: powershell

我正在获取部分由gsutil创建的文本文件的内容,并且尝试将其内容放入$ body中,但是我想省略包含特殊字符的文本块。问题是我无法匹配此文本块才能将其删除。因此,当我打印$ body时,它仍然包含我要忽略的所有文本。

这是我的代码的一部分:

$pattern = @"
==> NOTE: You are uploading one or more large file(s), which would run
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this you and any
users that download such composite files will need to have a compiled
crcmod installed (see "gsutil help crcmod").
"@

$pattern = ([regex]::Escape($pattern))

$body = Get-Content -Path C:\temp\file.txt -Raw | Select-String -Pattern $pattern -NotMatch

所以基本上我需要它来显示文本文件中的所有内容,除了$ pattern中的文本块。我尝试了不使用-Raw且不使用([regex] :: Escape($ pattern)),但它不会删除整个文本块。

这一定是因为特殊字符,可能是“,。(),因为如果我将模式简化,例如:

$pattern = @"
NOTE: You are uploading one or more
"@

然后它起作用,这部分文本将从$ body中删除。

如果将$ pattern中@“和” @之间的所有内容都按字面处理,那将是很好的。我想要没有功能等的最简单解决方案。如果有人可以帮助我解决这个问题,我将非常感激。

2 个答案:

答案 0 :(得分:1)

问题的全文存储在文件.\SO_55538262.txt

此脚本具有手动转义的模式:

$pattern = '(?sm)^==\> NOTE: You .*?"gsutil help crcmod"\)\.'

$body = (Get-Content .\SO_55538262.txt -raw) -replace $pattern
$body

返回此处:

I'm getting the contents of a text file which is partly created by gsutil and I'm trying to put its contents in $body but I want to omit a block of text that contains special characters. The problem is that I'm not able to match this block of text in order for it to be removed. So when I print out $body it still contains all the text that I'm trying to omit.

Here's a part of my code:

$pattern = @"

"@

$pattern = ([regex]::Escape($pattern))

$body = Get-Content -Path C:\temp\file.txt -Raw | Select-String -Pattern $pattern -NotMatch

So basically I need it to display everything inside the text file except for the block of text in $pattern. I tried without -Raw and without ([regex]::Escape($pattern)) but it won't remove that entire block of text.

It has to be because of the special characters, probably the " , . () because if I make the pattern simple such as:

$pattern = @" NOTE: You are uploading one or more "@

then it works and this part of text is removed from $body.

It'd be nice if everything inside $pattern between the @" and "@ was treated literally. I'd like the simplest solution without functions, etc.

来自regex101.com的RegEx解释:

(?sm)^==\> NOTE: You .*?"gsutil help crcmod"\)\.

(?sm) match the remainder of the pattern with the following effective flags: gms  
s modifier: single line. Dot matches newline characters  
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)   
^ asserts position at start of a line  
== matches the characters == literally (case sensitive)  
\> matches the character > literally (case sensitive)  
 NOTE: You matches the characters  NOTE: You literally (case sensitive)
.*?  
. matches any character 
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)  
"gsutil help crcmod" matches the characters "gsutil help crcmod" literally (case sensitive)  
\) matches the character ) literally (case sensitive)  
\. matches the character . literally (case sensitive)  

答案 1 :(得分:0)

解决此任务(不使用正则表达式)的简单方法是使用-notin运算符。由于Get-Contentstring[]的形式返回文件内容:

#requires -Version 4

$set = @('==> NOTE: You are uploading one or more large file(s), which would run'
'significantly faster if you enable parallel composite uploads. This'
'feature can be enabled by editing the'
'"parallel_composite_upload_threshold" value in your .boto'
'configuration file. However, note that if you do this you and any'
'users that download such composite files will need to have a compiled'
'crcmod installed (see "gsutil help crcmod").')

$filteredContent = @(Get-Content -Path $path).
    Where({ $_.Trim() -notin $set }) # trim added for misc whitespace

v2兼容解决方案:

@(Get-Content -Path $path) |
    Where-Object { $set -notcontains $_.Trim() }