逐行读取大文件

时间:2014-09-20 12:35:08

标签: file clojure io iterator iteration

我试图根据Clojure中的迭代为大文件编写阅读器。但是我怎么能在Clojure中逐行返回?我想做出类似的东西:

  

(println(do_something(readFile(:file opts)));处理并打印第一行
  (println(do_something(readFile(:file opts)));处理并打印第二行

代码:

(ns testapp.core
  (:gen-class)
  (:require [clojure.tools.cli :refer [cli]])
  (:require [clojure.java.io]))


(defn readFile [file, cnt]
  ; Iterate over opened file (read line by line)
  (with-open [rdr (clojure.java.io/reader file)]
    (let [seq (line-seq rdr)]
      ; how return only one line there? and after, when needed, take next line?
    )))

(defn -main [& args]
  ; Main function for project 
  (let [[opts args banner] 
        (cli args
          ["-h" "--help" "Print this help" :default false :flag true]
          ["-f" "--file" "REQUIRED: File with data"]
          ["-c" "--clusters" "Count of clusters" :default 3]
          ["-g" "--hamming" "Use Hamming algorithm"]
          ["-e" "--evklid" "Use Evklid algorithm"]
          )]
    ; Print help, when no typed args
    (when (:help opts)
      (println banner)
      (System/exit 0))
    ; Or process args and start work
    (if (and (:file opts) (or (:hamming opts) (:evklid opts)))
      (do
        ; Use Hamming algorithm
        (if (:hamming opts)
          (do
            (println (readFile (:file opts))
            (println (readFile (:file opts))
          )
          ;(count (readFile (:file opts)))
        ; Use Evklid algorithm
        (println "Evklid")))
      (println "Please, type path for file and algorithm!")))) 

3 个答案:

答案 0 :(得分:6)

可能是我不明白你的意思是“逐行退货”,但我建议你写函数,它接受文件和处理功能,然后打印每行的处理功能的结果你的大文件。或者,更普遍的方式,让我们接受处理功能和输出功能(默认为println),所以如果我们不仅要打印,而是通过网络发送,保存到某个地方,发送到另一个线程等:

(defn process-file-by-lines
  "Process file reading it line-by-line"
  ([file]
   (process-file-by-lines file identity))
  ([file process-fn]
   (process-file-by-lines file process-fn println))
  ([file process-fn output-fn]
   (with-open [rdr (clojure.java.io/reader file)]
     (doseq [line (line-seq rdr)]
       (output-fn
         (process-fn line))))))

所以

(process-file-by-lines "/tmp/tmp.txt") ;; Will just print file line by ine
(process-file-by-lines "/tmp/tmp.txt"
                       reverse) ;; Will print each line reversed

答案 1 :(得分:4)

您也可以尝试从阅读器中懒惰地阅读,这与line-seq返回的惰性字符串列表不同。详细信息在this answer to a very similar question中讨论,但其中的要点是:

 (defn lazy-file-lines [file]
      (letfn [(helper [rdr]
                (lazy-seq
                  (if-let [line (.readLine rdr)]
                    (cons line (helper rdr))
                    (do (.close rdr) nil))))]
        (helper (clojure.java.io/reader file))))

然后,您可以map通过只在必要时阅读的行。正如链接答案中更详细讨论的那样,缺点是,如果您在文件结束之前没有阅读,(.close rdr)将永远不会运行,可能会导致资源问题。

答案 2 :(得分:2)

尝试doseq:

(defn readFile [file]
  (with-open [rdr (clojure.java.io/reader file)]
    (doseq [line (line-seq rdr)]
      (println line))))