Question

空白区域是空格，制表符或换行符（即回车符或换行符）

我假设\s涵盖，\t，\n，\r和\f

但是当我尝试使用\s时，它无法正确拆分字符串：

# let line1 = "We the People of the United States, in Order to form a more perfect";;

# let wsp_regex = Str.regexp "\\s+";;
# let words = Str.split wsp_regex line1;;
val words : string list = 
["We the People of the United State"; ", in Order to form a more perfect"]

# let wsp_regex = Str.regexp "[ \\s]+";;
# let words = Str.split wsp_regex line1;;
val words : string list = 
["We"; "the"; "People"; "of"; "the"; "United"; "State"; ","; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]

# let wsp_regex = Str.regexp "[\\s]+";;
# let words = Str.split wsp_regex line1;;
val words : string list = 
["We the People of the United State"; ", in Order to form a more perfect"]

# let wsp_regex = Str.regexp "[ \\s\\t\\n\\r]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =                                                         
["We"; "he"; "People"; "of"; "he"; "U"; "i"; "ed"; "S"; "a"; "e"; ","; "i"; "O"; "de"; "o"; "fo"; "m"; "a"; "mo"; "e"; "pe"; "fec"]

# let wsp_regex = Str.regexp "[\s]+";;
Characters 29-31:                                                               
Warning 14: illegal backslash escape in string.                                 
val wsp_regex : Str.regexp = <abstr>   

# let words = Str.split wsp_regex line1;;
val words : string list =                                                         
["We the People of the United State"; ", in Order to form a more perfect"]

# let wsp_regex = Str.regexp "[ \s]+";;
Characters 30-32:                                                               
Warning 14: illegal backslash escape in string.                                 
val wsp_regex : Str.regexp = <abstr>
# let words = Str.split wsp_regex line1;;
val words : string list =                                                         
["We"; "the"; "People"; "of"; "the"; "United"; "State"; ","; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]

# let wsp_regex = Str.regexp "[ \t\n\r\f]+";;
Characters 36-38:                                                               
Warning 14: illegal backslash escape in string.                                 
val wsp_regex : Str.regexp = <abstr>  
# let words = Str.split wsp_regex line1;;
val words : string list =                                                         
["We"; "the"; "People"; "o"; "the"; "United"; "States,"; "in"; "Order"; "to"; "orm"; "a"; "more"; "per"; "ect"] 

# let wsp_regex = Str.regexp "[\t\n\r\f]+";;
Characters 35-37:                                                               
Warning 14: illegal backslash escape in string.                                 
val wsp_regex : Str.regexp = <abstr>
# let words = Str.split wsp_regex line1;;
val words : string list =                                                         
["We the People o"; " the United States, in Order to "; "orm a more per"; "ect"]

似乎唯一有效的案例是：

# let wsp_regex = Str.regexp "[ ]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =                                                         
["We"; "the"; "People"; "of"; "the"; "United"; "States,"; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]

# let wsp_regex = Str.regexp "[ \t\n\r]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =                                                         
["We"; "the"; "People"; "of"; "the"; "United"; "States,"; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]

我不确定为什么第二种情况有效，因为[ \s]+无法正常工作（Ocaml认为我想分开或s

我想要的只是在不使用的情况下拆分空白，因为我还要捕获\t，\n，\r和\f 。

但是我似乎无法弄清楚如何在Ocaml中创建一个正则表达式来分割白色空格。

如果有人能为我提供一个非常感激的工作表达方式！

Answer 1

在Str module的文档中，您会发现\s不受支持。因此，您的第一个表达式将在字符s的序列上分隔单词。事实上，这就是你所看到的。

使用\s的其他任何尝试均无效，因为\s不受支持。

令人惊讶的是，即使\n（双字符号）也不支持作为正则表达式。因此，如果要匹配换行符，则需要在正则表达式模式中使用实际换行符。换句话说，您希望字符串具有以下内容："\n"，而不是："\\n"。 \r和\t也是如此。

OCaml字符串语法不接受符号\f。如果您想匹配表单Feed，则需要使用其十六进制表示法\x0c。

综上所述，您的模式应为："[ \n\r\x0c\t]+"。

# Str.split (Str.regexp "[ \n\r\x0c\t]+") line1;;
- : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "States,"; "in";
 "Order"; "to"; "form"; "a"; "more"; "perfect"]

有一个Perl兼容的正则表达式包，您可能会觉得使用起来更舒服：https://opam.ocaml.org/packages/pcre/pcre.7.1.5/

如何拆分Ocaml中的空格？

1 个答案: