Question

in scheme，

有"hello hellu-#\"hella.helloo,hallo#\return#\""字符串

我想将它们列为（“hello”“hellu”“hella”“helloo”“hallo”）

按空格，连字符，双引号，点，逗号，返回

分开

我试过

(regexp-split #rx"( +)|(#\-)|(#\")|(#\.)|(,)|(#\return)" string)

但#\- , #\.犯了错误

任何提示或解决方案？

感谢

Answer 1

看起来您将字符（#\foo）的语法与字符串的语法混淆，并且您在字符串和正则表达式中都这样做。所以我的猜测是你要拆分的字符串实际上是：

"hello hellu-\"hella.helloo,hallo\n\""

其中\"代表双引号字符，\n代表换行符。如果是这种情况，那么（再次，这是猜测你的意图）正则表达式应该是：

(regexp-split #rx"( +)|(\-)|(\")|(\.)|(,)|(\n)" string)

但是这也不起作用，因为\-和\.是无效的转义（Racket使用类似C的转义），所以将其更改为：

(regexp-split #rx"( +)|(-)|(\")|(.)|(,)|(\n)" string)

这也不起作用，因为.在正则表达式中具有通常的“任何字符”含义 - 所以你想用反斜杠转义它。与许多其他字符串语法一样，您通过使用反斜杠转义它来获得反斜杠，所以现在我们有一个最终接近工作版本的版本：

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"( +)|(-)|(\")|(\\.)|(,)|(\n)" string)
'("hello" "hellu" "" "hella" "helloo" "hallo" "" "")

首先，regexp可以大大改进：拆分不需要parens：

(regexp-split #rx" +|-|\"|\\.|,|\n" string)

然后，您可以只使用“字符范围”，而不是使用一堆带有|的单字符：

(regexp-split #rx" +|[-\".,\n]" string)

请注意，-是范围中的第一个（或最后一个）字符非常重要，因此它不具有一系列字符的通常含义。接下来，您似乎真的希望将这些字符的任何序列用作分隔符，这将避免结果中的一些空字符串：

(regexp-split #rx" +|[-\".,\n]+" string)

在这种情况下你也可以把空间扔到范围内（小心地将放在<{em> -后面，如上所述）。我们现在得到：

> (define string "hello hellu-\"hella.helloo,hallo\n\"") > (regexp-split #rx"[- \".,\n]+" string) '("hello" "hellu" "hella" "helloo" "hallo" "")

最后你可能想要摆脱最后一个空字符串。从技术上讲，应该存在，因为在字符串结尾之前有一系列匹配字符。在Racket中，一个简单的方法是使用补充regexp-match*，它返回匹配的列表，而不是在匹配列表中拆分：

> (define string "hello hellu-\"hella.helloo,hallo\n\"") > (regexp-match* #rx"[- \".,\n]+" string) '(" " "-\"" "." "," "\n\"")

这明显被打破了，因为它给你分隔符而不是它们之间的分隔符。但由于这个正则表达式是一系列字符，因此很容易解决 - 只是否定字符范围，你得到你想要的东西：

> (define string "hello hellu-\"hella.helloo,hallo\n\"") > (regexp-match* #rx"[^- \".,\n]+" string) '("hello" "hellu" "hella" "helloo" "hallo")

方案中的正则表达式和转义字符

1 个答案: