Question

当我编写用于文本解析的Erlang程序时，我经常会遇到我希望使用正则表达式进行模式匹配的情况。

例如，我希望我可以做这样的事情，其中〜是一个“组成”的正则表达式匹配运算符：

my_function(String ~ ["^[A-Za-z]+[A-Za-z0-9]*$"]) ->
    ....

我知道正则表达式模块（重新），但AFAIK在模式匹配或警卫时无法调用函数。

此外，我希望匹配字符串可以以不区分大小写的方式完成。这很方便，例如，在解析HTTP标头时，我很乐意做这样的事情，其中“Str~ {Pattern，Options}”的意思是“使用选项选项匹配模式模式”：

handle_accept_language_header(Header ~ {"Accept-Language", [case_insensitive]}) ->
    ...

两个问题：

你如何使用标准的Erlang来处理这个问题？在简洁和易于阅读方面是否有一些机制/编码风格接近这一点？
Erlang有没有工作（EEP？）来解决这个问题？

Answer 1

除了提前运行正则表达式然后对结果进行模式匹配之外，你真的没有太多选择。这是一个非常简单的例子，接近我认为你所追求的，但它确实遭受了重复两次regexp所需的缺陷。通过使用宏在一个地方定义每个正则表达式，可以减少痛苦。

-module(multire).

-compile(export_all).

multire([],_) ->
    nomatch;
multire([RE|RegExps],String) ->
    case re:run(String,RE,[{capture,none}]) of
    match ->
        RE;
    nomatch ->
        multire(RegExps,String)
    end.


test(Foo) ->
    test2(multire(["^Hello","world$","^....$"],Foo),Foo).

test2("^Hello",Foo) ->
    io:format("~p matched the hello pattern~n",[Foo]);
test2("world$",Foo) ->
    io:format("~p matched the world pattern~n",[Foo]);
test2("^....$",Foo) ->
    io:format("~p matched the four chars pattern~n",[Foo]);
test2(nomatch,Foo) ->
    io:format("~p failed to match~n",[Foo]).

Answer 2

可能的方法是使用 Erlang Web样式的注释（宏）与 re Erlang模块结合使用。一个例子可能是说明这一点的最佳方式。

这是您最终代码的样子：

[...]
?MATCH({Regexp, Options}).
foo(_Args) ->
  ok.
[...]

MATCH 宏将在 foo 函数之前执行。如果正则表达式模式不匹配，则执行流程将失败。

您的匹配功能将声明如下：

?BEFORE.
match({Regexp, Options}, TgtMod, TgtFun, TgtFunArgs) ->
String = proplists:get_value(string, TgtArgs),
case re:run(String, Regexp, Options) of
  nomatch ->
    {error, {TgtMod, match_error, []}};
  {match, _Captured} ->
    {proceed, TgtFunArgs}
end.

请注意：

BEFORE 表示宏将在目标函数之前执行（AFTER宏也可用）。
match_error是您的错误处理程序，在您的模块中指定，并包含您在匹配失败时要执行的代码（可能没有，只是阻止执行流程）
这种方法的优点是可以使用 re 模块使regexp语法和选项保持一致（避免混淆）。

有关Erlang Web注释的更多信息，请访问：

http://wiki.erlang-web.org/Annotations

在这里：

http://wiki.erlang-web.org/HowTo/CreateAnnotation

该软件是开源的，因此您可能希望重用其注释引擎。

Answer 3

您可以使用re模块：

re:run(String, "^[A-Za-z]+[A-Za-z0-9]*$").
re:run(String, "^[A-Za-z]+[A-Za-z0-9]*$", [caseless]).

编辑：

match(String, Regexps) -> 
  case lists:dropwhile(
               fun({Regexp, Opts}) -> re:run(String, Regexp, Opts) =:= nomatch;
                  (Regexp) -> re:run(String, Regexp) =:= nomatch end,
               Regexps) of
    [R|_] -> R;
    _     -> nomatch
  end.

example(String) ->
  Regexps = ["$RE1^", {"$RE2^", [caseless]}, "$RE3"]
  case match(String, Regexps) of
    nomatch -> handle_error();
    Regexp -> handle_regexp(String, Regexp)
    ...

Answer 4

对于字符串，您可以使用“re”模块：之后，迭代结果集。我担心没有其他方法可以做到AFAIK：这就是为什么有正则表达式。
对于HTTP标头，由于可能有很多，我会考虑迭代结果集，以便更好地选择来代替编写一个非常长的表达式（可能）。
EEP工作：我不知道。

Answer 5

Erlang不处理模式中的正则表达式。
没有

Answer 6

你不能在正则表达式上模式匹配，抱歉。所以你必须这样做

my_function(String) -> Matches = re:run(String, "^[A-Za-z]+[A-Za-z0-9]*$"),
                       ...

如何使用正则表达式进行Erlang模式匹配？

6 个答案: