Question

提前抱歉这可能有点难以阅读...

我正在尝试解析一行（实际上是来自IMAP服务器的主题行），如下所示：

=?utf-8?Q?Here is som?= =?utf-8?Q?e text.?=

有点难以看到，但上面的行中有两个=?/?=对。（总会有一对;理论上可以有很多对。）在每个=?/?=对中，我想要提取第三个参数（由?分隔符定义）。（在第一对中，它是“这里是som”，在第二对中它是“e文本”。）

这是我正在使用的正则表达式：

=\?(.+)\?.\?(.*?)\?=

我希望它返回两个匹配项，每个=?/?=对一个匹配项。相反，它将整条线作为单个匹配返回。我原以为?中的(.*?)会使*运算符变得懒惰，会阻止这种情况发生，但显然不会发生这种情况。

有什么建议吗？

编辑：根据以下建议更换“。？”用“[^（\？=）] ？”我现在正在尝试：

=\?(.+)\?.\?([^(\?=)]*?)\?=

......但它也不起作用。（我不确定[^（\？=）] *？是否是测试排除两个字符序列（如“？=”）的正确方法。它是否正确？）

Answer 1

试试这个：

\=\?([^?]+)\?.\?(.*?)\?\=

我将.+更改为[^?]+，这意味着“除?以外的所有内容”

Answer 2

一个解决方案：

=\?(.*?)\?=\s*=\?(.*?)\?=

说明：

=\?    # Literal characters '=?'
(.*?)  # Match each character until find next one in the regular expression. A '?' in this case.
\?=    # Literal characters '?='
\s*    # Match spaces.
=\?    # Literal characters '=?'
(.*?)  # Match each character until find next one in the regular expression. A '?' in this case.
\?=    # Literal characters '?='

在“perl”计划中进行测试：

use warnings;
use strict;

while ( <DATA> ) { 
    printf qq[Group 1 -> %s\nGroup 2 -> %s\n], $1, $2 if m/=\?(.*?)\?=\s*=\?(.*?)\?=/;
}   

__DATA__
=?utf-8?Q?Here is som?= =?utf-8?Q?e text.?=

运行：

perl script.pl

结果：

Group 1 -> utf-8?Q?Here is som                                                                                                                                                                                                               
Group 2 -> utf-8?Q?e text.

编辑评论：

我会使用全局修饰符/.../g。正则表达式为：

/=\?(?:[^?]*\?){2}([^?]*)/g

说明：

=\?              # Literal characters '=?'
(?:[^?]*\?){2}   # Any number of characters except '?' with a '?' after them. This process twice to omit the string 'utf-8?Q?'
([^?]*)          # Save in a group next characters until found a '?'
/g               # Repeat this process multiple times until end of string.

在Perl脚本中测试：

use warnings;
use strict;

while ( <DATA> ) {
        printf qq[Group -> %s\n], $1 while m/=\?(?:[^?]*\?){2}([^?]*)/g;

}

__DATA__
=?utf-8?Q?Here is som?= =?utf-8?Q?e text.?= =?utf-8?Q?more text?=

跑步和结果：

Group -> Here is som
Group -> e text.
Group -> more text

Answer 3

根据我的经验，一个很好的做法是不使用.*?，而是使用*而不使用？，但要优化字符类。在这种情况下，[^?]*匹配一系列非问号字符。

您也可以通过这种方式匹配更复杂的终端标记，例如，在这种情况下，您的终止限制器是？=，因此您希望匹配非现成标记，以及后跟非等于的问号：

([^?]*\?[^=])*[^?]*

此时选择变得更加困难。我喜欢这个解决方案更严格，但在这种情况下可读性会降低。

Answer 4

感谢大家的回答！解决我的问题的最简单的表达是：

=\?(.*?)\?.\?(.*?)\?=

这与我最初发布的表达之间的唯一区别是添加了一个？（非贪婪）运算符在第一个“。*”上。很关键，我忘了它。

正则表达式问题*懒惰

4 个答案: