给出以下字符串:
(def text "this is the first sentence . And this is the second sentence")
我想通过在每次出现单词后附加计数来计算文本中出现“ this ”这样的单词的次数。像这样:
["this: 1", "is" "the" "first" "sentence" "." "and" "this: 2" ...]
作为第一步,我将字符串标记为:
(def words (split text #" "))
然后我创建了一个辅助函数来获取文本中出现“ this ”的次数:
(defn count-this [x] (count(re-seq #"this" text)))
最后我试着在这个循环中使用count-this函数的结果:
(for [x words]
(if (= x "this")
(str "this: "(apply str (take (count-this)(iterate inc 0))))
x))
这是我得到的:
("this: 01" "is" "the" "first" "sentence" "." "And" "this: 01" "is" ...)
答案 0 :(得分:1)
除了根据需要构建新字符串之外,使用reduce通过向量遍历来线程计数器可以相当简洁地实现这一点:
(def text "this is the first sentence. And this is the second sentence.")
(defn notate-occurences [word string]
(->
(reduce
(fn [[count string'] member]
(if (= member word)
(let [count' (inc count)]
[count' (conj string' (str member ": " count'))])
[count (conj string' member)]))
[0 []]
(clojure.string/split string #" "))
second))
(notate-occurences "this" text)
;; ["this: 1" "is" "the" "first" "sentence." "And" "this: 2" "is" "the" "second""sentence."]
答案 1 :(得分:1)
你需要保持一些状态。 reduce
,loop
/ recur
和iterate
都是这样做的。 iterate
只是从一个州过渡到另一个州。这是过渡功能:
(defn transition [word]
(fn [[[head & tail] counted out]]
(let [[next-counted to-append] (if (= word head)
[(inc counted) (str head ": " (inc counted))]
[counted head])]
[tail next-counted (conj out to-append)])))
然后你可以使用iterate
来练习这个功能,直到没有输入:
(let [in (s/split "this is the first sentence . And this is the second sentence" #" ")
step (transition "this")]
(->> (iterate step [in 0 []])
(drop-while (fn [[[head & _] _ _]]
head))
(map #(nth % 2))
first))
;; => ["this: 1" "is" "the" "first" "sentence" "." "And" "this: 2" "is" "the" "second" "sentence"]
答案 2 :(得分:1)
(defn split-by-word [word text]
(remove empty?
(flatten
(map #(if (number? %) (str word ": " (+ 1 %)) (clojure.string/split (clojure.string/trim %) #" "))
(butlast (interleave
(clojure.string/split (str text " ") (java.util.regex.Pattern/compile (str "\\b" word "\\b")))
(range)))))))
答案 3 :(得分:0)
这种方法的问题是(apply str (take (count-this)(iterate inc 0)))
每次都会评估相同的事情。
要对变量进行完全控制,通常要使用循环形式。
e.g。
(defn add-indexes [word phrase]
(let [words (str/split phrase #"\s+")]
(loop [src words
dest []
counter 1]
(if (seq src)
(if (= word (first src))
(recur (rest src) (conj dest (str word " " counter)) (inc counter))
(recur (rest src) (conj dest (first src)) counter))
dest))))
user=> (add-indexes "this" "this is the first sentence . And this is the second sentence")
["this 1" "is" "the" "first" "sentence" "." "And" "this 2" "is" "the" "second" "sentence"]
loop
允许您在每次传递时指定每个循环变量的值。因此,您可以根据自己的逻辑决定是否更改它们。
如果您愿意接受Java并且做一些感觉像作弊的事情,那么这也会有用。
(defn add-indexes2 [word phrase]
(let [count (java.util.concurrent.atomic.AtomicInteger. 1)]
(map #(if (= word %) (str % " " (.getAndIncrement count)) %)
(str/split phrase #"\s+"))))
user=> (add-indexes2 "this" "this is the first sentence . And this is the second sentence")
("this 1" "is" "the" "first" "sentence" "." "And" "this 2" "is" "the" "second" "sentence")
使用可变计数器可能不是纯粹的,但另一方面,它永远不会逃脱函数的上下文,因此它的行为不能被外力所改变。
答案 4 :(得分:0)
通常,您可以找到一种简单的方法,以非常简洁的方式从现有的Clojure函数中编写解决方案。
这是您的问题的两个非常简短的解决方案。首先,如果您不需要将结果作为序列,但可以替换字符串:
(require '(clojure.string))
(def text "this is the first sentence . And this is the second sentence")
(defn replace-token [ca token]
(swap! ca inc)
(str token ": " @ca))
(defn count-this [text]
(let [counter (atom 0)
replacer-fn (partial replace-token counter)]
(clojure.string/replace text #"this" replacer-fn)))
(count-this text)
; => "this: 1 is the first sentence . And this: 2 is the second sentence"
上述解决方案利用了可以向clojure.string/replace
提供函数的事实。
其次,如果您需要将结果作为序列,那么标记化会产生一些开销:
(defn count-seq [text]
(let [counter (atom 0)
replacer-fn (partial replace-token counter)
converter (fn [tokens] (map #(if (not= % "this")
%
(replacer-fn %))
tokens))]
(-> text
(clojure.string/split #" ")
(converter))))
(count-seq text)
; => ("this: 1" "is" "the" "first" "sentence" "." "And" "this: 2" "is" "the" "second" "sentence")
loop-recur
模式对于来自非功能语言的初始Clojurians非常常见。在大多数情况下,使用map
,reduce
和朋友进行功能处理,有一个更清洁,更惯用的解决方案。
与其他答案一样,原始尝试中的主要问题是您的计数器的绑定。事实上,(iterate inc 0)
并不受任何约束。查看上面的示例,仔细考虑绑定原子counter
的范围。作为参考,here is an example of using closures,也可以在这种情况下使用,并取得巨大成功!
作为上述示例的脚注:对于更清晰的代码,您应该通过提取和重用count-seq
和count-this
函数的公共部分来制定更通用的解决方案。此外,可以从converter
中提取本地count-seq
函数。 replace-token
已经适用于所有令牌,但请考虑如何将整个解决方案扩展到除“this”之外的匹配文本之外。这些留给读者练习。