Question

我最近编写了一个小的Perl脚本来修剪行尾的空格并遇到意外行为。我决定在分解行时Perl必须包含行尾字符，因此测试了该理论并获得了更多意外行为。 I do not应匹配\s+$或t$ ...不是两者都匹配。非常困惑。谁能开导我？

£ cat example
I have space after me
I do not
£ perl -ne 'print if /\s+$/' example
I have a space after me
I do not
£ perl -ne 'print if /t$/' example
I do not
£

PCRE测试仪给出了预期的结果。我还尝试了/m后缀而没有改变行为。

修改。完整性：

£ perl -ne 'print if /e$/' example
£

perl -ne 'print if...'的预期行为与grep -P相同：

£ grep -P '\s+$' example
I have a space after me
£

可以在Ubuntu 16.04 perl v5.22.1（60和68补丁版本）和MINGW perl v5.26.1下进行重新编译。

Answer 1

您会看到当前行为，因为在example文件中，第二行末尾有\n个字符。 \n是与\s

匹配的空格

perlretut

无修饰符：默认行为。 ......＆＃39; $＆＃39;仅在结尾处或在结尾处换行之前匹配。

在你的正则表达式\s matches a whitespace character, the set [\ \t\v\r\n\f]。换句话说，它匹配空格和\n字符。然后$匹配行尾（没有字符，只有位置本身）。就像word anchor \b匹配字边界一样，^匹配行的开头而不匹配第一个字符

你可以像这样重写你的正则表达式：

/[\t ]+$/

如果第二行没有以example字符结尾，则\n的内容会如下所示：

£ cat example
I have space after me
I do not£

注意shell提示符£不在下一行

结果不同，因为grep抽象出像Perl的-l标志这样的行结尾。（grep -P '\n'将在grep -Pz '\n'所在的文本文件中不返回任何结果。）

Answer 2

您的问题源于-n选项和\s的使用。 -n标志将输入逐行输入到$_，然后调用print if match语句。

在你的比赛中，你使用$锚来匹配线的末尾。锚是纯粹的位置，不会消耗换行符或任何其他字符。

使用\s+自行检查here：无论您是否添加$，正则表达式都匹配相同数量的字符。
这是因为\s等于[\r\n\t\f\v ]并匹配任何空白字符，并且您添加了+量词。因此，它在一次和无限次之间匹配，尽可能多次（贪婪）。

如果您只搜索尾随空格字符，那么您就是好人：[ ]+$（此处通过群组进行转义）：

£ perl -ne 'print if /[ ]+$/' example

这种方式与\n \s不匹配。亲自尝试here。

<强>加成：

以下是一些常用的Perl单行修剪空间：

# Strip leading whitespace (spaces, tabs) from the beginning of each line
perl -ple 's/^[ \t]+//'
perl -ple 's/^\s+//'

# Strip trailing whitespace (space, tabs) from the end of each line
perl -ple 's/[ \t]+$//'

# Strip whitespace from the beginning and end of each line
perl -ple 's/^[ \t]+|[ \t]+$//g'

perl正则表达式的行锚行为

2 个答案: