Question

我的问题与Split string including regular expression match相同，但对于Scala。不幸的是，JavaScript解决方案在Scala中不起作用。

我正在解析一些文字。假设我有一些字符串：

"hello wold <1> this is some random text <3> foo <12>"

我想获得以下Seq："hello world" :: "<1>" :: "this is some random text" :: "<3>" :: "foo" :: "<12>"。

请注意，每当遇到＆lt;“number”＆gt;时，我就会分割字符串。序列

Answer 1

val s = "hello wold <1> this is some random text <3> foo <12>"
s: java.lang.String = hello wold <1> this is some random text <3> foo <12>

s.split("""((?=<\d{1,3}>)|(?<=<\d{1,3}>))""")
res0: Array[java.lang.String] = Array(hello wold , <1>,  this is some random text , <3>,  foo , <12>)

你真的试过你的编辑吗？让\d+无效。请参阅this question。

s.split("""((?=<\d+>)|(?<=<\d+>))""")
java.util.regex.PatternSyntaxException: Look-behind group does not have an obvious maximum length near index 19

Answer 2

这是一个快速但有点hacky的解决方案：

scala> val str = "hello wold <1> this is some random text <3> foo <12>"
str: String = hello wold <1> this is some random text <3> foo <12>

scala> str.replaceAll("<\\d+>", "_$0_").split("_")
res0: Array[String] = Array("hello wold ", <1>, " this is some random text ", <3>, " foo ", <12>)

当然，这个解决方案的问题在于我给下划线字符赋予了特殊的含义。如果它在原始字符串中自然出现，则会得到错误的结果。因此，您必须选择另一个魔术字符序列，您可以确定它不会出现在原始字符串中，或者使用更多的转义/转义。

另一种解决方案涉及使用前瞻和后瞻模式，如this question中所述。

如何在Scala中拆分String但保持部件与正则表达式匹配？

2 个答案: