在Clojure中使用java.io/reader时获得副作用的正确方法是什么?

时间:2014-04-09 20:49:37

标签: clojure

我正在从一个非常大的文本文件中读取行。该文件包含一组我想要从中选择特定行号的数据。我想要做的是从文件中读取一行,如果该行是我想要的行,请将其与我的结果联系起来,如果不是,则检查下一行。我不想存储我在记忆中看到的所有线条,所以我想在读取它们时将它们从阅读器行中删除。

我有这样的功能:

;; evaluates but doesn't modify the line sequence so continuously adds 
;; the same first line to the result. I would like this exact function 
;; but somehow have it drop the first line of lines at each iteration.
    (defn get-training-data [batch-size batch-num]
      (let [line-numbers (fn that returns vector of random numbers)]
        (with-open [rdr (clojure.java.io/reader "resources/sample.txt")]
          (let [lines (line-seq rdr) res []]
            (for [i (range (apply max line-numbers))
                  :let [res (conj res (json/read-str (first lines)))]
                  :when (some #{i} line-numbers)]
              res)))))

我也有这样的功能:

;;this works as I want it to, but only with a small file and produces a 
;;stack overflow with a large file
    (defn get-training-data1 [batch-size batch-num]
      (let [line-numbers (fn that returns a vector of random numbers)]
        (with-open [rdr (clojure.java.io/reader "resources/sample.txt")]
          (let [lines (line-seq rdr)]
            (loop [i 0 f (apply max line-numbers) res [] lines lines]
              (if (> i f)
                res
                (if (some #{i} line-numbers)
                  (recur
                   (inc i)
                   f
                   (conj res (json/read-str (first lines)))
                   (drop 1 lines))
                  (recur
                   (inc i)
                   f
                   res
                   (drop 1 lines)))))))))

当我试图测试时,我开发了以下更简单的案例:

;;works
(let [res []]
  (for [i (range 10)
        :let [res (conj res i)]
        :when (odd? i)]
    res)) ;;([1] [3] [5] [7] [9])

;;now an attempt to get the same result but have a side effect each time, 
;;produces null pointer exception.
(let [res []]
  (for [i (range 10)
        :let [res (conj res i)]  
        :when (odd? i)]
    (doall 
     (println i)
     res)))

我相信如果我能弄清楚如何在for中产生副作用,那么第一个问题就会得到解决,因为我可以让副作用放弃读者的第一行序列

你们有什么想法吗?

1 个答案:

答案 0 :(得分:5)

地图和过滤器可以很好地完成这项工作,并保持懒惰状态,这样你就不会再存储在内存中了。

user> (->> (line-seq (clojure.java.io/reader "project.clj")) ;; lazy sequence of lines
           (map vector (range))                              ;; add an index
           (filter #(#{1 3 7 9} (first %)))                  ;; filter by index
           (map second ))                                    ;; drop the index

("  :description \"API server for Yummly mobile app(s)\"" 
 "[com.project/example \"1.4.8-SNAPSHOT\"]" 
 "                 [org.clojure/tools.cli \"0.2\.4\"]" 
 "                 [clojurewerkz/mailer \"1.0.0-alpha3\"]")