Java / clojure:多个字符分隔符,并保留分隔符

时间:2013-03-08 04:25:49

标签: java string clojure split tokenize

我正在使用clojure中的一个项目,它可以与任何java类互操作,所以我的问题的答案可能是java或clojure。

基本上我需要能够根据给定的分隔符(它将超过一个字符)将字符串拆分为组件,但同时保留分隔符。

例如:

splitting "test:test:test" on ":"  => [ "test" ":" "test" ":" "test" ]
splitting "::test::test::" on "::" => [ "::" "test" "::" "test" "::" ]

我使用clojure的clojure.string/split来使用壁橱,但它实际上并没有返回分隔符。第二个最接近的是使用StringTokenizer,它确实返回分隔符但不接受多字符分隔符。

有没有人知道任何解决方案,然后将字符串分解为一系列字符并对其进行奇怪的缩减?

2 个答案:

答案 0 :(得分:8)

这是一个构建正则表达式以匹配分隔符之前和之后的间隙的版本,而不是分隔符字符串本身(假设d中没有正则表达式特殊字符):

=> (defn split-with-delim [s d]
     (clojure.string/split s (re-pattern (str "(?=" d ")|(?<=" d ")"))))
#'user/split-with-delim
=> (split-with-delim "test:test:test" ":")
["test" ":" "test" ":" "test"]
=> (split-with-delim "::test::test::" "::")
["" "::" "test" "::" "test" "::"]

答案 1 :(得分:4)

(defn split-it [s d]
  (interpose d (str/split s (re-pattern d))))

(split-it "test:test:test" ":")
=> ("test" ":" "test" ":" "test")

(split-it "::test::test::" "::")
=> ("" "::" "test" "::" "test")