Clojure:获取正则表达式匹配列表

时间:2010-10-18 20:34:07

标签: regex clojure

也许我说这一切都错了,但我正在尝试将字符串中的所有匹配用于特定的正则表达式模式。我正在使用re-matcher来获取一个Match对象,我将其传递给re-find,给我(full-string-matchgrouped-text)对。我如何得到Match对象产生的所有匹配序列?

在Clojuresque Python中,它看起来像:

pairs = []
match = re-matcher(regex, line)

while True:
    pair = re-find(match)
    if not pair: break
    pairs.append(pair)

有什么建议吗?

1 个答案:

答案 0 :(得分:23)

您可能希望使用内置的re-seq和Clojure内置的正则表达式文字。除非你真的有,否则不要乱用底层的java对象。

(doc re-seq)


clojure.core/re-seq
([re s])
  Returns a lazy sequence of successive matches of pattern in string,
  using java.util.regex.Matcher.find(), each such match processed with
  re-groups. 

For example:

user> (re-seq #"the \w+" "the cat sat on the mat")
("the cat" "the mat")

In answer to the follow-up comment, group captures will result in a vector of strings with an element for each part of the group in a match:

user> (re-seq #"the (\w+(t))" "the cat sat on the mat")
(["the cat" "cat" "t"] ["the mat" "mat" "t"])

You can extract a specific element by taking advantage of the elegant fact that vectors are functions of their indices.

user> (defn extract-group [n] (fn [group] (group n)))
#'user/extract-group
user> (let [matches (re-seq #"the (\w+(t))" "the cat sat on the mat")]
       (map (extract-group 1) matches))
("cat" "mat")

Or you can destructure the matches (here using a for macro to go over all the matches but this could also be done in a let or function argument binding):

user> (re-seq #"the \w+" "the cat sat on the mat")
("the cat" "the mat")