Question

我已经编写了自己的CSS minifier来获得乐趣和利润（而不是那么多的利润），而且效果很好。我现在正在尝试简化它，因为我基本上过滤了10次以上的文件。对于一个小文件并不是一个大问题，但是它们越大，性能就越差。

是否有更优雅的方式来过滤我的输入文件？我假设正则表达式会有办法，但我不是正则表达式向导...

$a = (gc($path + $file) -Raw)
$a = $a -replace "\s{2,100}(?<!\S)", ""
$a = $a -replace " {",    "{"
$a = $a -replace "} ",    "}"
$a = $a -replace " \(",   "\("
$a = $a -replace "\) ",   "\)"
$a = $a -replace " \[",   "\["
$a = $a -replace "\] ",   "\]"
$a = $a -replace ": ",    ":"
$a = $a -replace "; ",    ";"
$a = $a -replace ", ",    ","
$a = $a -replace "\n",    ""
$a = $a -replace "\t",    ""

为了让你有点头疼，我基本上使用第一个-replace从长度为2-100个字符去除任何连续的witespace。其余的替换声明包括在特定情况下清理单个空间。

我如何组合这个，所以我没有过滤12次文件？

Answer 1

negative lookbehind (?<!\S)：(?<!prefix)thing匹配不的内容在左侧具有前缀。当你把它放在正则表达式的末尾，之后没有任何东西，我认为它什么也没做。您可能打算将其放在左侧，或者可能打算成为负面的提前，我不会尝试猜测，我只是为此删除它答案。
您错过了character classes的使用。 abc查找文字abc，但将其放在方括号中，[abc]查找任意字符 a，{{1} }，b。
1. 使用它，您可以将最后两行合并为一行：c替换换行符或制表符。
您可以使用正则表达式逻辑OR [\n\t]将两个单独的（替换为无）规则组合在一起进行匹配：| - 匹配空格或换行符或制表符。（您可以使用OR两次而不是字符，fwiw）。
使用regex capture groups可以引用正确的正则表达式匹配，而不事先知道那是什么。
1. e.g。 \s{2,100}|[\n\t]和"space bracket -> bracket"以及"space colon -> colon"都遵循一般模式"space comma -> comma"。与尾随空格"space (thing) -> (thing)"相同。
2. 将捕获组与字符类合并，将其余所有行合并为一个。

e.g。

"(thing) space -> (thing)"

再次使用其他$a -replace " (:)", '$1' # capture the colon, replacement is not ':' # it is "whatever was in the capture group" $a -replace " ([:,])", '$1' # capture the colon, or comma. Replacement # is "whatever was in the capture group" # space colon -> colon, space comma -> comma # make the space optional with \s{0,1} and put it at the start and end \s{0,1}([:,])\s{0,1} #now it will match "space (thing)" or "(thing) space" # Add in the rest of the characters, with appropriate \ escapes # gained from [regex]::Escape('those chars here') # Your original: $a = (gc D:\css\1.css -Raw) $a = $a -replace "\s{2,100}(?<!\S)", "" $a = $a -replace " {", "{" $a = $a -replace "} ", "}" $a = $a -replace " $", "\(" $a = $a -replace "$ ", "\)" $a = $a -replace " \[", "\[" $a = $a -replace "\] ", "\]" $a = $a -replace ": ", ":" $a = $a -replace "; ", ";" $a = $a -replace ", ", "," $a = $a -replace "\n", "" $a = $a -replace "\t", "" # My version: $b = gc d:\css\1.css -Raw $b = $b -replace "\s{2,100}|[\n\t]", "" $b = $b -replace '\s{0,1}([])}{([:;,])\s{0,1}', '$1' # Test that they both do the same thing on my random downloaded sample file: $b -eq $a # Yep.将两者合并为一个：

你可以花很多时间建立自己不可读的正则表达式，这在任何真实场景中都可能明显加快。：）

NB。 $c = gc d:\css\1.css -Raw $c = $c -replace "\s{2,100}|[\n\t]|\s{0,1}([])}{([:;,])\s{0,1}", '$1' $c -eq $a # also same output as your original. NB. that the space and tab and newline capture nothing, so '$1' is empty, which removes them.在替换中，美元是.Net正则表达式引擎语法，而不是PowerShell变量。如果你使用双引号，PowerShell将从变量$ 1进行字符串插值，并可能用任何内容替换它。

Answer 2

您可以使用捕获组将类似于1个更大表达式的模式加入，并在Regex替换方法中使用回调，您可以在其中评估匹配结构并使用适当的操作。

以下是您可以扩展的方案的解决方案：

$callback = {  param($match) 
  if ($match.Groups[1].Success -eq $true) { "" }
  else { 
    if ($match.Groups[2].Success -eq $true) { $match.Groups[2].Value }
    else {
      if ($match.Groups[3].Success -eq $true) { $match.Groups[3].Value }
      else {
        if ($match.Groups[4].Success -eq $true) { $match.Groups[4].Value }
      }
    }
  }
}
$path = "d:\input\folder\"
$file = "input_file.txt"
$a = [IO.File]::ReadAllText($path + $file)
$rx = [regex]'(\s{2,100}(?<!\S)|[\n\t])|\s+([{([])|([])}])\s+|([:;,])\s+'
$rx.Replace($a, $callback) | Out-File "d:\result\file.txt"

模式详情：

(\s{2,100}(?<!\S)|[\n\t]) - 第1组捕获2到100个空格，前面没有非空白字符（可能这个看起来很冗余）或者换行符或制表符
| - 或
\s+([{([]) - 只匹配一个或多个空格（\s+），然后将[{([]字符类中的任何一个字符捕获到第2组：{，{{ 1}}或(
[ - 或者第3组捕获|([])}])\s+字符类中的任何一个字符：[])}]，}或)，然后匹配一个或多个空格
] - 或者第4组捕获来自|([:;,])\s+ char类（[:;,]，:或;）以及一个或多个空格的任何字符。

在Powershell中的RegEx，结合替换调用

2 个答案: