Question

昨天我发布了一个问题： Perl Regular expression remove double tabs, line breaks, white spaces

在你的大力帮助下，我能够解决问题。首先，解决方案：

$txt="\nRemove empty line at beg.".
     "\n\nRemove double empty line,      double spaces and ending space: \n".
     "\n and beginning. Same for tabs\t\n".
     "\tSame for Tab at beginning and multiple tabs \t\t\t and line break at end:\n\n\n";

# Works
$txt=~s/\r//gs; # * this is needed for actual $txt which may contain \r

# Following *should* replace it not with 1 space, but with 1 space or \t depending on input
$txt =~ s/[\t ]+/ /gs; # Replace duplicate whitespace mid-string with 1 space

$txt =~ s/[\t ]*$//gms; # Remove ending spaces/tabs
$txt =~ s/^[\t ]*//gms; # Remove starting spaces/tabs

$txt=~s/\n+/\n/gs;      # replace all runs of > 1 \n with a single \n

# clearly redundant
$txt =~ s/^$//ms;       # Remove completely empty lines ** does not work **
$txt =~ s/^\n//ms;       # Remove completely empty lines (beg.)
$txt =~ s/\n$//ms;       # Remove completely empty lines (end.)

此有效，但不是很漂亮。

因此我想问两件事：我怎么能把它写成一个班轮？我仍然想保留这些评论，但我认为在这么多行中做到这一点非常低效。也许我错了，那就没关系了。它有效，但我觉得它并不接近完美。我不需要它是完美的，但我想更好地理解正则表达式。因此：关于做得更好的任何建议？即是吗？这里多余或超级？ 3.网上是否有任何正则表达式教程可以介绍所有正则表达式的可能性，然后针对这些进行训练任务？

Answer 1

我会把它保留为一系列正则表达式。我不认为它作为一个单行程将具有更高的计算效率。复杂的正则表达式可能需要大量的回溯。无论如何，维护起来会更困难。我只想把它包装成squeeze_whitespace程序。

$txt =~ s/^$//ms;不起作用，因为$在换行符之前的字符串的末尾匹配。因此，如果字符串只包含换行符，则不会与之匹配。

$txt =~ s/^\n//ms;不会删除字符串开头的空行，因为/m会更改^和$以匹配任何行的开头和结尾。你很幸运，你的测试数据表明你的字符串以换行符开头，所以它匹配并停止。与以下相同。使用\A和\z或不发送垃圾邮件/ms。

$txt =~ s{([\t ])+}{$1}g将仅使用一个标签或空格替换标签或空格的运行。但是，使用"this \t that"之类的内容，它会选择 last 字符。

这使我们接受测试。

use Test::More; note "test tab and whitespace squeezing"; { is squeeze_whitespace("this that"), "this that"; is squeeze_whitespace("this\t\tthat"), "this\tthat"; is squeeze_whitespace("this \t that up ", "this that up"; } note "test begin/end newline stripping"; { is squeeze_whitespace("\nfoo\n"), "foo", "newlines removed from the start and end"; is squeeze_whitespace("foo\nbar"), "foo\nbar", "newlines not eaten if there's no newline at the start"; }

等等

关于Perl Regex的反馈和提示的建议

1 个答案: