在clojure中对矢量内的字符串进行编号

时间:2017-10-27 03:33:31

标签: string vector clojure count

给出以下字符串:

(def text "this is the first sentence . And this is the second sentence")

我想通过在每次出现单词后附加计数来计算文本中出现“ this ”这样的单词的次数。像这样:

["this: 1", "is" "the" "first" "sentence" "." "and" "this: 2" ...]

作为第一步,我将字符串标记为:

 (def words (split text #" "))

然后我创建了一个辅助函数来获取文本中出现“ this ”的次数:

 (defn count-this [x] (count(re-seq #"this" text)))

最后我试着在这个循环中使用count-this函数的结果:

(for [x words]
(if (= x "this")
(str "this: "(apply str (take (count-this)(iterate inc 0))))
x))

这是我得到的:

("this: 01" "is" "the" "first" "sentence" "." "And" "this: 01" "is" ...)

5 个答案:

答案 0 :(得分:1)

除了根据需要构建新字符串之外,使用reduce通过向量遍历来线程计数器可以相当简洁地实现这一点:

(def text "this is the first sentence. And this is the second sentence.")

(defn notate-occurences [word string]
  (->
    (reduce 
        (fn [[count string'] member] 
            (if (= member word) 
              (let [count' (inc count)]
                [count' (conj string' (str member ": " count'))])
              [count (conj string' member)]))
          [0 []]
          (clojure.string/split string #" "))
    second))

(notate-occurences "this" text) 
;; ["this: 1" "is" "the" "first" "sentence." "And" "this: 2" "is" "the" "second""sentence."]

答案 1 :(得分:1)

你需要保持一些状态。 reduceloop / recuriterate都是这样做的。 iterate只是从一个州过渡到另一个州。这是过渡功能:

(defn transition [word]
  (fn [[[head & tail] counted out]]
    (let [[next-counted to-append] (if (= word head)
                                    [(inc counted) (str head ": " (inc counted))]
                                    [counted head])]
      [tail next-counted (conj out to-append)])))

然后你可以使用iterate来练习这个功能,直到没有输入:

(let [in (s/split "this is the first sentence . And this is the second sentence" #" ")
      step (transition "this")]
    (->> (iterate step [in 0 []])
         (drop-while (fn [[[head & _] _ _]]
                       head))
         (map #(nth % 2))
         first))

;; => ["this: 1" "is" "the" "first" "sentence" "." "And" "this: 2" "is" "the" "second" "sentence"]

答案 2 :(得分:1)

(defn split-by-word [word text]
    (remove empty?
        (flatten
            (map #(if (number? %) (str word ": " (+ 1 %)) (clojure.string/split (clojure.string/trim %) #" "))
                 (butlast (interleave
                      (clojure.string/split (str text " ") (java.util.regex.Pattern/compile (str "\\b" word "\\b")))
                      (range)))))))

答案 3 :(得分:0)

这种方法的问题是(apply str (take (count-this)(iterate inc 0)))每次都会评估相同的事情。

要对变量进行完全控制,通常要使用循环形式。

e.g。

(defn add-indexes [word phrase]
  (let [words (str/split phrase #"\s+")]
    (loop [src words
           dest []
           counter 1]
      (if (seq src)
        (if (= word (first src))
          (recur (rest src) (conj dest (str word " " counter)) (inc counter))
          (recur (rest src) (conj dest (first src)) counter))
        dest))))

user=> (add-indexes "this" "this is the first sentence . And this is the second sentence")
["this 1" "is" "the" "first" "sentence" "." "And" "this 2" "is" "the" "second" "sentence"]

loop允许您在每次传递时指定每个循环变量的值。因此,您可以根据自己的逻辑决定是否更改它们。

如果您愿意接受Java并且做一些感觉像作弊的事情,那么这也会有用。

(defn add-indexes2 [word phrase]
  (let [count (java.util.concurrent.atomic.AtomicInteger. 1)]
    (map #(if (= word %) (str % " " (.getAndIncrement count)) %)
         (str/split phrase #"\s+"))))

user=> (add-indexes2 "this" "this is the first sentence . And this is the second sentence")
("this 1" "is" "the" "first" "sentence" "." "And" "this 2" "is" "the" "second" "sentence")

使用可变计数器可能不是纯粹的,但另一方面,它永远不会逃脱函数的上下文,因此它的行为不能被外力所改变。

答案 4 :(得分:0)

通常,您可以找到一种简单的方法,以非常简洁的方式从现有的Clojure函数中编写解决方案。

这是您的问题的两个非常简短的解决方案。首先,如果您不需要将结果作为序列,但可以替换字符串:

(require '(clojure.string))

(def text "this is the first sentence . And this is the second sentence")

(defn replace-token [ca token]
  (swap! ca inc)
  (str token ": " @ca))

(defn count-this [text]
  (let [counter     (atom 0)
        replacer-fn (partial replace-token counter)]
    (clojure.string/replace text #"this" replacer-fn)))

(count-this text)
; => "this: 1 is the first sentence . And this: 2 is the second sentence"

上述解决方案利用了可以向clojure.string/replace提供函数的事实。

其次,如果您需要将结果作为序列,那么标记化会产生一些开销:

(defn count-seq [text]
  (let [counter      (atom 0)
        replacer-fn  (partial replace-token counter)
        converter    (fn [tokens] (map #(if (not= % "this")
                                            % 
                                            (replacer-fn %))
                                       tokens))]
    (-> text
        (clojure.string/split #" ")
        (converter))))

(count-seq text)

; => ("this: 1" "is" "the" "first" "sentence" "." "And" "this: 2" "is" "the" "second" "sentence")

loop-recur模式对于来自非功能语言的初始Clojurians非常常见。在大多数情况下,使用mapreduce和朋友进行功能处理,有一个更清洁,更惯用的解决方案。

与其他答案一样,原始尝试中的主要问题是您的计数器的绑定。事实上,(iterate inc 0)并不受任何约束。查看上面的示例,仔细考虑绑定原子counter的范围。作为参考,here is an example of using closures,也可以在这种情况下使用,并取得巨大成功!

作为上述示例的脚注:对于更清晰的代码,您应该通过提取和重用count-seqcount-this函数的公共部分来制定更通用的解决方案。此外,可以从converter中提取本地count-seq函数。 replace-token已经适用于所有令牌,但请考虑如何将整个解决方案扩展到除“this”之外的匹配文本之外。这些留给读者练习。