Question

我希望在Haskell中使用与perl兼容的正则表达式，特别是速记字符类\w和\s等。

我了解这些在标准posix模块中不可用：

import Text.Regex.Posix

"this is a string" =~ "\S+"

<interactive>:3:25: error:
    lexical error in string/character literal at character 'S'

但是，我希望pcre包可以处理此问题，但结果相同：

import Text.Regex.PCRE

"this is a string" =~ "\S+"

<interactive>:2:25: error:
    lexical error in string/character literal at character 'S'

在python中，它将像这样工作：

>>> import re
>>> re.findall(r'\S+', "this is a string")
['this', 'is', 'a', 'string']

如何在Haskell中使用这些正则表达式字符类？

Answer 1

这与正则表达式或Haskell与Python无关。请注意，^†也不会写re.findall("\S+", "this is a string")。您需要原始字符串文字才能使用反斜杠。 Haskell没有内置的原始字符串文字，但是它确实具有准引号，可让您emulate them。

Prelude> :set -XQuasiQuotes 
Prelude> :m +Text.RawString.QQ Text.Regex.PCRE
Prelude Text.RawString.QQ Text.Regex.PCRE> "this is a string" =~ [r|\S+|] :: String
"this"

或者，只需两次转义反斜杠："this is a string" =~ "\\S+"

^† _{实际上，事实证明，单反斜杠版本即使在带有简单引号的情况下也可以在Python中使用，但这似乎是一个后备规则。最好不要依靠它。}

Answer 2

使用Posix，您可以使用：

\w ...  [\p{L}\p{M}\p{Nd}\p{Nl}\p{Pc}]
\W ...  [\p{L}\p{M}\p{Nd}\p{Nl}\p{Pc}]
\s ...  [[:space:]]
\S ...  [^[:space:]]
\d ...  [[:digit:]]
\D ...  [^[:digit:]]

使用PCRE软件包，您可以使用：

\w ...  [\p{L}\p{M}\p{Nl}\p{Nd}\p{Pc}]
\W ...  [^\p{L}\p{M}\p{Nl}\p{Nd}\p{Pc}]
\s ...  [\p{Z}\t\n\cK\f\r\x85]
\S ...  [^\p{Z}\t\n\cK\f\r\x85]
\d ...  \p{Nd}
\D ...  \P{Nd}

Perl兼容的正则表达式与Haskell中无法识别的字符类

2 个答案: