我正在从一个非常大的文本文件中读取行。该文件包含一组我想要从中选择特定行号的数据。我想要做的是从文件中读取一行,如果该行是我想要的行,请将其与我的结果联系起来,如果不是,则检查下一行。我不想存储我在记忆中看到的所有线条,所以我想在读取它们时将它们从阅读器行中删除。
我有这样的功能:
;; evaluates but doesn't modify the line sequence so continuously adds
;; the same first line to the result. I would like this exact function
;; but somehow have it drop the first line of lines at each iteration.
(defn get-training-data [batch-size batch-num]
(let [line-numbers (fn that returns vector of random numbers)]
(with-open [rdr (clojure.java.io/reader "resources/sample.txt")]
(let [lines (line-seq rdr) res []]
(for [i (range (apply max line-numbers))
:let [res (conj res (json/read-str (first lines)))]
:when (some #{i} line-numbers)]
res)))))
我也有这样的功能:
;;this works as I want it to, but only with a small file and produces a
;;stack overflow with a large file
(defn get-training-data1 [batch-size batch-num]
(let [line-numbers (fn that returns a vector of random numbers)]
(with-open [rdr (clojure.java.io/reader "resources/sample.txt")]
(let [lines (line-seq rdr)]
(loop [i 0 f (apply max line-numbers) res [] lines lines]
(if (> i f)
res
(if (some #{i} line-numbers)
(recur
(inc i)
f
(conj res (json/read-str (first lines)))
(drop 1 lines))
(recur
(inc i)
f
res
(drop 1 lines)))))))))
当我试图测试时,我开发了以下更简单的案例:
;;works
(let [res []]
(for [i (range 10)
:let [res (conj res i)]
:when (odd? i)]
res)) ;;([1] [3] [5] [7] [9])
;;now an attempt to get the same result but have a side effect each time,
;;produces null pointer exception.
(let [res []]
(for [i (range 10)
:let [res (conj res i)]
:when (odd? i)]
(doall
(println i)
res)))
我相信如果我能弄清楚如何在for中产生副作用,那么第一个问题就会得到解决,因为我可以让副作用放弃读者的第一行序列
你们有什么想法吗?
答案 0 :(得分:5)
地图和过滤器可以很好地完成这项工作,并保持懒惰状态,这样你就不会再存储在内存中了。
user> (->> (line-seq (clojure.java.io/reader "project.clj")) ;; lazy sequence of lines
(map vector (range)) ;; add an index
(filter #(#{1 3 7 9} (first %))) ;; filter by index
(map second )) ;; drop the index
(" :description \"API server for Yummly mobile app(s)\""
"[com.project/example \"1.4.8-SNAPSHOT\"]"
" [org.clojure/tools.cli \"0.2\.4\"]"
" [clojurewerkz/mailer \"1.0.0-alpha3\"]")