空白区域是空格,制表符或换行符(即回车符或换行符)
我假设\s
涵盖,
\t
,\n
,\r
和\f
但是当我尝试使用\s
时,它无法正确拆分字符串:
# let line1 = "We the People of the United States, in Order to form a more perfect";;
# let wsp_regex = Str.regexp "\\s+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We the People of the United State"; ", in Order to form a more perfect"]
# let wsp_regex = Str.regexp "[ \\s]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "State"; ","; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]
# let wsp_regex = Str.regexp "[\\s]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We the People of the United State"; ", in Order to form a more perfect"]
# let wsp_regex = Str.regexp "[ \\s\\t\\n\\r]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "he"; "People"; "of"; "he"; "U"; "i"; "ed"; "S"; "a"; "e"; ","; "i"; "O"; "de"; "o"; "fo"; "m"; "a"; "mo"; "e"; "pe"; "fec"]
# let wsp_regex = Str.regexp "[\s]+";;
Characters 29-31:
Warning 14: illegal backslash escape in string.
val wsp_regex : Str.regexp = <abstr>
# let words = Str.split wsp_regex line1;;
val words : string list =
["We the People of the United State"; ", in Order to form a more perfect"]
# let wsp_regex = Str.regexp "[ \s]+";;
Characters 30-32:
Warning 14: illegal backslash escape in string.
val wsp_regex : Str.regexp = <abstr>
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "State"; ","; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]
# let wsp_regex = Str.regexp "[ \t\n\r\f]+";;
Characters 36-38:
Warning 14: illegal backslash escape in string.
val wsp_regex : Str.regexp = <abstr>
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "o"; "the"; "United"; "States,"; "in"; "Order"; "to"; "orm"; "a"; "more"; "per"; "ect"]
# let wsp_regex = Str.regexp "[\t\n\r\f]+";;
Characters 35-37:
Warning 14: illegal backslash escape in string.
val wsp_regex : Str.regexp = <abstr>
# let words = Str.split wsp_regex line1;;
val words : string list =
["We the People o"; " the United States, in Order to "; "orm a more per"; "ect"]
似乎唯一有效的案例是:
# let wsp_regex = Str.regexp "[ ]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "States,"; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]
# let wsp_regex = Str.regexp "[ \t\n\r]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "States,"; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]
我不确定为什么第二种情况有效,因为[ \s]+
无法正常工作(Ocaml认为我想分开或
s
我想要的只是在不使用的情况下拆分空白,因为我还要捕获
\t
,\n
,\r
和\f
。
但是我似乎无法弄清楚如何在Ocaml中创建一个正则表达式来分割白色空格。
如果有人能为我提供一个非常感激的工作表达方式!
答案 0 :(得分:7)
在Str
module的文档中,您会发现\s
不受支持。因此,您的第一个表达式将在字符s
的序列上分隔单词。事实上,这就是你所看到的。
使用\s
的其他任何尝试均无效,因为\s
不受支持。
令人惊讶的是,即使\n
(双字符号)也不支持作为正则表达式。因此,如果要匹配换行符,则需要在正则表达式模式中使用实际换行符。换句话说,您希望字符串具有以下内容:"\n"
,而不是:"\\n"
。 \r
和\t
也是如此。
OCaml字符串语法不接受符号\f
。如果您想匹配表单Feed,则需要使用其十六进制表示法\x0c
。
综上所述,您的模式应为:"[ \n\r\x0c\t]+"
。
# Str.split (Str.regexp "[ \n\r\x0c\t]+") line1;;
- : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "States,"; "in";
"Order"; "to"; "form"; "a"; "more"; "perfect"]
有一个Perl兼容的正则表达式包,您可能会觉得使用起来更舒服:https://opam.ocaml.org/packages/pcre/pcre.7.1.5/