考虑以懒惰序列存储的句子:每个单词都是一个条目,但标点符号属于单词:
("It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!")
它现在应该是"分区"用句子。我写了一个辅助函数last-punctuated ?,它检查最后一个字符是否是非字母字符。 (没问题)
期望的结果:
(("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))
一切都应该保持懒惰。不幸的是我不能使用partition-by:这个函数在之前拆分给定谓词的结果发生变化,这意味着被打断的条目不会被解释为子序列中的最后一个条目。
答案 0 :(得分:1)
我建议使用lazy-seq
。想不出比这更好的东西(也许它不是最好的):
(defn parts [items pred]
(lazy-seq
(when (seq items)
(let [[l r] (split-with (complement pred) items)]
(cons (concat l (take 1 r))
(parts (rest r) pred))))))
在repl中:
user> (let [items '("It's" "time" "when" "it's"
"time!" "What" "did" "you"
"say?" "Nothing!")]
(parts items (comp #{\? \! \. \,} last)))
(("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))
user> (let [items '("what?" "It's" "time" "when" "it's"
"time!" "What" "did" "you"
"say?" "Nothing!")]
(parts items (comp #{\? \! \. \,} last)))
(("what?") ("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))
user> (let [items '("what?" "It's" "time" "when" "it's"
"time!" "What" "did" "you"
"say?" "Nothing!")]
(realized? (parts items (comp #{\? \! \. \,} last))))
false
更新:可能与iterate
相同的方法会更好。
(defn parts [items pred]
(->> [nil items]
(iterate (fn [[_ items]]
(let [[l r] (split-with (complement pred) items)]
[(concat l (take 1 r)) (rest r)])))
rest
(map first)
(take-while seq)))
答案 1 :(得分:1)
这个问题实际上可以通过生成一个新的序列来实现,该序列包含"分裂令牌"然后根据不同的谓词做partition-by
。:
(def punctuation? #{\. \! \?})
(def words ["It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!"])
(defn partition-sentences [ws]
(->> ws
(mapcat #(if (punctuation? (last %)) [% :br] [%]))
(partition-by #(= :br %))
(take-nth 2)))
(println (take 20 (partition-sentences (repeatedly #(rand-nth words))))
答案 2 :(得分:0)
当输入的大小与输出的大小不同时,答案通常是使用reduce
。
(defn last-word? [word]
(assert word)
(or (.endsWith word "!")
(.endsWith word "?")))
(defn make-sentence [in]
(reduce (fn [acc ele]
(let [up-to-current-sentence (vec (butlast acc))
last-word-last-sentence (-> acc last last)
new-sentence? (when last-word-last-sentence (last-word? last-word-last-sentence))
current-sentence (vec (last acc))]
(if new-sentence?
(conj acc [ele])
(conj up-to-current-sentence (conj current-sentence ele)))))
[] in))
不幸的是reduce
需要结束,因此无法使用惰性输入。有讨论here。