Question

考虑以懒惰序列存储的句子：每个单词都是一个条目，但标点符号属于单词：

("It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!")

它现在应该是＆＃34;分区＆＃34;用句子。我写了一个辅助函数last-punctuated ?,它检查最后一个字符是否是非字母字符。（没问题）

期望的结果：

(("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))

一切都应该保持懒惰。不幸的是我不能使用partition-by：这个函数在之前拆分给定谓词的结果发生变化，这意味着被打断的条目不会被解释为子序列中的最后一个条目。

Answer 1

我建议使用lazy-seq。想不出比这更好的东西（也许它不是最好的）：

(defn parts [items pred]
  (lazy-seq
   (when (seq items)
     (let [[l r] (split-with (complement pred) items)]
       (cons (concat l (take 1 r))
             (parts (rest r) pred))))))

在repl中：

user> (let [items '("It's" "time" "when" "it's"
                    "time!" "What" "did" "you"
                    "say?" "Nothing!")]
        (parts items (comp #{\? \! \. \,} last)))

(("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))

user> (let [items '("what?" "It's" "time" "when" "it's"
                    "time!" "What" "did" "you"
                    "say?" "Nothing!")]
        (parts items (comp #{\? \! \. \,} last)))

(("what?") ("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))

user> (let [items '("what?" "It's" "time" "when" "it's"
                    "time!" "What" "did" "you"
                    "say?" "Nothing!")]
        (realized? (parts items (comp #{\? \! \. \,} last))))

false

更新：可能与iterate相同的方法会更好。

(defn parts [items pred]
  (->> [nil items]
       (iterate (fn [[_ items]]
                  (let [[l r] (split-with (complement pred) items)]
                    [(concat l (take 1 r)) (rest r)])))
       rest
       (map first)
       (take-while seq)))

Answer 2

这个问题实际上可以通过生成一个新的序列来实现，该序列包含＆＃34;分裂令牌＆＃34;然后根据不同的谓词做partition-by。：

(def punctuation? #{\. \! \?})

(def words ["It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!"])

(defn partition-sentences [ws]
  (->> ws
    (mapcat #(if (punctuation? (last %)) [% :br] [%]))
    (partition-by #(= :br %))
    (take-nth 2)))


(println (take 20 (partition-sentences (repeatedly #(rand-nth words))))

Answer 3

当输入的大小与输出的大小不同时，答案通常是使用reduce。

(defn last-word? [word]
  (assert word)
  (or (.endsWith word "!")
      (.endsWith word "?")))

(defn make-sentence [in]
  (reduce (fn [acc ele]
            (let [up-to-current-sentence (vec (butlast acc))
                  last-word-last-sentence (-> acc last last)
                  new-sentence? (when last-word-last-sentence (last-word? last-word-last-sentence))
                  current-sentence (vec (last acc))]
              (if new-sentence?
                (conj acc [ele])
                (conj up-to-current-sentence (conj current-sentence ele)))))
          [] in))

不幸的是reduce需要结束，因此无法使用惰性输入。有讨论here。

分割一个懒惰的序列 - 在 - 谓词真相测试改变之后

3 个答案: