如何在SMLNJ中使用正则表达式

时间:2016-02-17 16:55:04

标签: regex smlnj

我想输入一个字符串,然后想看看它是否与某个正则表达式匹配;如果不是我想继续使用另一个正则表达式,直到我的所有正则表达式都用完为止。例如,假设我有以下3个正则表达式

  • regex_1 = [a-zA-Z] *
  • regex_2 = [0-9] *
  • regex_3 =(1tom | 2jerry)

现在假设所需的字符串是:

- val str_input="7569"

我想首先用regex_1检查str_input;如果它不匹配则尝试使用regex_2;如果不匹配则最后尝试使用regex_3。 问题是如何将SMLNJ用于此目的。谢谢。

1 个答案:

答案 0 :(得分:2)

您可以使用SML / NJ提供的正则表达式库来实现您想要的功能。其文档可在此处找到:http://www.smlnj.org/doc/smlnj-lib/Manual/regexp-lib-part.html

作为一个小小的入门示例,这是您需要做的事情。首先,您需要告诉SML / NJ您要使用regexp库。您可以使用.cm文件完成此操作(cm来自编译管理器,它是SML / NJ的Makefile):

sources.cm

group is
  $/basis.cm      (* Load standard functions and modules. *)
  $/regexp-lib.cm (* Load the regexp library.             *)
  main.sml        (* Load our own source file.            *)

现在我们可以使用regexp库了。不幸的是,它并不是很简单,因为它使用了仿函数和读者,但基本上,你需要的是RE.match函数,它接受一对对的列表,其中第一个元素是正则表达式,第二个元素是匹配正则表达式时调用的函数。使用这个对列表,RE.match函数将遍历输入字符串,直到找到匹配为止,此时它将调用与该点匹配的正则表达式相关联的函数。该函数的结果是整个RE.match调用的结果。

main.sml

structure Main =
  struct
    (**
     * RE is a module created by calling the module-level function (functor)
     * RegExpFn (Fn comes from functor), with two module arguments.
     *
     * The first argument, called P, is the syntax used to write regular
     * expressions in. In this particular case, it's the Awk syntax, which
     * is the only syntax provided by SML/NJ right now.
     *
     * The second argument, called E, is the RegExp engine used behind the
     * scenes to compile and execute the syntax. In this particular case
     * I've opted from ThompsonEngine, which implements Ken Thompson's
     * matching algorithm. Other options are BackTrackEngine and DfaEngine.
     *)
    structure RE = RegExpFn(
      structure P = AwkSyntax
      structure E = ThompsonEngine
      (* structure E = BackTrackEngine *)
      (* structure E = DfaEngine *)
    )

    fun main () =
      let
        (**
         * A list of (regexp, match function) pairs. The function called by
         * RE.match is the one associated with the regexp that matched.
         *
         * The match parameter is described here:
         *   http://www.smlnj.org/doc/smlnj-lib/Manual/match-tree.html
         *)
        val regexes = [
          ("[a-zA-Z]*",   fn match => ("1st", match)),
          ("[0-9]*",      fn match => ("2nd", match)),
          ("1tom|2jerry", fn match => ("3rd", match))
        ]
        val input = "7569"
      in
        (**
         * StringCvt.scanString will traverse the `input` string and apply
         * the result of `RE.match regexes` to each character in the string.
         *
         * It's sort of a streaming matching process. The end result, however,
         * depends on your implementation above, in the match functions.
         *)
        StringCvt.scanString (RE.match regexes) input
      end
  end

您现在可以从命令行中使用它:

$ sml sources.cm
Standard ML of New Jersey v110.79 [built: Sun Jan  3 23:12:46 2016]
[scanning sources.cm]
[library $/regexp-lib.cm is stable]
[parsing (sources.cm):main.sml]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-BASIS/(basis.cm):basis-common.cm is stable]
- Main.main ();
[autoloading]
[autoloading done]
val it = SOME ("2nd",Match ({len=4,pos=0},[]))
  : (string * StringCvt.cs Main.RE.match) option

文档