假设您有一个Clojure源代码文件。文件本身可能如下所示:
(ns foo
"We've got some sort of docstring here. \"this\" would be an example of
some sort of escaped text within that docstring.")
(defn bar
"Another docstring down here."
[x]
true)
现在,让我们假设,我想在这里捕获一个或两个文档字符串的内容。
问题在于,如果我将它啜饮到Clojure REPL中,一切都会被双重逃脱。所以它看起来像这样:
(ns foo\n\"We've got some sort of docstring here. \\\"this\\\" would be an example of\nsome sort of escaped text within that docstring.\")\n\n(defn bar\n\"Another docstring down here.\"\n[x]\ntrue)
到目前为止我一直在使用的正则表达式如下:
(re-find #"\"(\\.|[^\"])*\"" source-string)
这很合理,因为它通过了我能提出的所有琐碎的测试用例。但是,它不需要特别大的语料库来导致它遇到StackOverflowError。
所以,伟大巫师的存储库,我转向你。我应该使用不同的正则表达式吗?正则表达式只是错误的答案在这里?如果是这样,是什么?
答案 0 :(得分:0)
您可以根据clojure.edn/read
使用以下内容:
(defn expr-seq [in]
(let [r (.read in)]
(if (= -1 r)
nil
(do
(.unread in r)
(cons (clojure.edn/read in) (lazy-seq (expr-seq in)))))))
(defn doc-string [[_ _ ds]]
(when (string? ds) ds))
(def sexps
(with-open [in (-> (slurp "/path/to/file.clj")
clojure.string/trim
java.io.StringReader.
java.io.PushbackReader.)]
(doall (expr-seq in))))
; docstrings
(map doc-string sexps)
=> ("We've got some sort of docstring here. \"this\" would be an example of\n some sort of escaped text within that docstring." "Another docstring down here.")
; all strings
(filter string? (tree-seq coll? seq sexps))