我想并行化我的Clojure实现

时间:2016-10-18 10:51:24

标签: clojure future

好的,所以我有一个算法它做的是,它逐行循环填充,然后在行中查找给定的单词。它不仅返回给定的单词,而且还返回该单词之前和之后的单词(也作为参数)。

Eg.line = "I am overflowing with blessings and you also are"
           parameters = ("you" 2)
           output = (blessings and you also are)

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (line-seq r)]
    (let [x (topMostLoop l "good" 2)]
      (if (not (empty? x))
        (println x)))))

上面的代码工作正常。但我想并行化,所以我在下面做了这个

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (line-seq r)]
    (future
      (let [x (topMostLoop l "good" 2)]
        (if (not (empty? x))
          (println x))))))

然后输出全部凌乱。我知道我需要锁定某个地方,但不知道在哪里。

(defn topMostLoop [contents word next]
  (let [mywords (str/split contents #"[ ,\\.]+")]
    (map (fn [element] (
                        return-lines (max 0 (- element next))
                        (min (+ element next) (- (count mywords) 1)) mywords))
         (vec ((indexHashMap mywords) word)))))

如果有人可以帮助我,我会很高兴这是我留下的最后一件事。

NB。如果我还需要发布其他功能,请告诉我

为了更清晰,我添加了其他功能

(defn return-lines [firstItem lastItem contentArray]
  (take (+ (- lastItem firstItem) 1) 
        (map (fn [element] (str element))
             (vec (drop firstItem contentArray)))))

(defn indexHashMap [mywords]
  (->> (zipmap (range) mywords)     ;contents is a list of words
       (reduce (fn [index [location word]]
                 (merge-with concat index {word (list location)})) {})))

1 个答案:

答案 0 :(得分:3)

首先,当您使用串行方法时,请使用map作为第一个示例:

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (map #(topMostLoop %1 "good" 2) (line-seq r))]
    (if (not (empty? l))
        (println l))))

使用此方法topMostLoop函数应用于每一行,并返回结果的延迟seq。在doseq体中,如果不是空的,则打印功能结果。

之后,将map替换为pmap,它将并行运行映射,结果将按给定行的顺序显示:

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (pmap #(topMostLoop %1 "good" 2) (line-seq r))]
    (if (not (empty? l))
        (println l))))

在你的期货案例中,结果将是正常的失序(一些后期期货将比前期期货更早完成执行)。

我通过以下修改对此进行了测试(不是读取文本文件,而是创建数字向量的懒惰序列,在向量中搜索值并返回周围):

(def lines (repeatedly #(shuffle (range 1 11))))
(def lines-10 (take 10 lines))

lines-10
([5 8 3 10 6 9 7 2 1 4]
[6 8 9 7 2 5 10 4 1 3]
[2 7 8 9 1 5 10 3 4 6]
[10 8 3 5 7 2 4 9 6 1]
[8 6 10 1 9 4 3 7 2 5]
[9 6 8 1 5 10 3 4 2 7]
[10 9 3 7 1 8 4 6 5 2]
[6 1 4 10 3 7 8 9 5 2]
[9 6 7 5 8 3 10 4 2 1]
[4 1 5 2 7 3 6 9 8 10])

(defn surrounding
 [v value size]
  (let [i (.indexOf v value)]
   (if (= i -1)
    nil
    (subvec v (max (- i size) 0) (inc (min (+ i size) (dec (count v))))))))

(doseq [l (map #(surrounding % 3 2) lines-10)] (if (not (empty? l)) (println l)))
[5 8 3 10 6]
[4 1 3]
[5 10 3 4 6]
[10 8 3 5 7]
[9 4 3 7 2]
[5 10 3 4 2]
[10 9 3 7 1]
[4 10 3 7 8]
[5 8 3 10 4]
[2 7 3 6 9]
nil

(doseq [l (pmap #(surrounding % 3 2) lines-10)] (if (not (empty? l)) (println l)))
[5 8 3 10 6]
[4 1 3]
[5 10 3 4 6]
[10 8 3 5 7]
[9 4 3 7 2]
[5 10 3 4 2]
[10 9 3 7 1]
[4 10 3 7 8]
[5 8 3 10 4]
[2 7 3 6 9]
nil