clojure.java.jdbc /查询大结果集懒惰

时间:2013-11-01 14:15:24

标签: jdbc clojure

我正在尝试从数据库中读取数百万行并写入文本文件。

这是我的问题database dump to text file with side effects

的延续

我现在的问题似乎是在程序完成之前不会发生日志记录。我没有懒惰处理的另一个指标是在程序结束之前根本不写入文本文件。

根据IRC提示,似乎我的问题可能与:result-set-fn有关,并且在代码的doall区域默认为clojure.java.jdbc/query

我试图用for函数替换它,但仍然发现内存消耗很高,因为它将整个结果集拉入内存。

我怎样才能拥有:result-set-fn不会像doall那样拉动所有内容?如何在程序运行时逐步编写日志文件,而不是在-main执行完成后转储所有内容?

    (let [ 
          db-spec              local-postgres
          sql                  "select * from public.f_5500_sf "
          log-report-interval  1000
          fetch-size           100
          field-delim          "\t"                                                                  
          row-delim            "\n"                                                                  
          db-connection        (doto ( j/get-connection db-spec) (.setAutoCommit false)) 
          statement            (j/prepare-statement db-connection sql :fetch-size fetch-size ) 
          joiner               (fn [v] (str (join field-delim v ) row-delim ) )                      
          start                (System/currentTimeMillis)                                            
          rate-calc            (fn [r] (float (/ r (/ ( - (System/currentTimeMillis) start) 100))))  
          row-count            (atom 0)                                                              
          result-set-fn        (fn [rs] (lazy-seq rs))
          lazy-results         (rest (j/query db-connection [statement] :as-arrays? true :row-fn joiner :result-set-fn result-set-fn)) 
          ]; }}}
      (.setAutoCommit db-connection false)
      (info "Started dbdump session...")    
      (with-open [^java.io.Writer wrtr (io/writer "output.txt")]
        (info "Running query...")    
        (doseq [row lazy-results] 
          (.write wrtr row)
          ))  
        (info (format "Completed write with %d rows"   @row-count))
      )

3 个答案:

答案 0 :(得分:8)

我通过在我的project.clj依赖项列表中添加clojure.java.jdbc来获取[org.clojure/java.jdbc "0.3.0-beta1"]的最新修补程序。这个增强/更正了:as-arrays? true clojure.java.jdbc/query所描述的:result-set-fn功能。

我认为这有点帮助,但我仍然可以覆盖vec:row-fn

通过将所有行逻辑塞入j/query来解决核心问题。最初的OutOfMemory问题与迭代:row-fn结果集而不是定义特定的(defn -main [] (let [; {{{ db-spec local-postgres source-sql "select * from public.f_5500 " log-report-interval 1000 fetch-size 1000 row-count (atom 0) field-delim "\u0001" ; unlikely to be in source feed, ; although i should still check in ; replace-newline below (for when "\t" ; is used especially) row-delim "\n" ; unless fixed-width, target doesn't ; support non-printable chars for recDelim like db-connection (doto ( j/get-connection db-spec) (.setAutoCommit false)) statement (j/prepare-statement db-connection source-sql :fetch-size fetch-size :concurrency :read-only) start (System/currentTimeMillis) rate-calc (fn [r] (float (/ r (/ ( - (System/currentTimeMillis) start) 100)))) replace-newline (fn [s] (if (string? s) (clojure.string/replace s #"\n" " ") s)) row-fn (fn [v] (swap! row-count inc) (when (zero? (mod @row-count log-report-interval)) (info (format "wrote %d rows" @row-count)) (info (format "\trows/s %.2f" (rate-calc @row-count))) (info (format "\tPercent Mem used %s " (memory-percent-used)))) (str (join field-delim (doall (map #(replace-newline %) v))) row-delim )) ]; }}} (info "Started database table dump session...") (with-open [^java.io.Writer wrtr (io/writer "./sql/output.txt")] (j/query db-connection [statement] :as-arrays? true :row-fn #(.write wrtr (row-fn %)))) (info (format "\t\t\tCompleted with %d rows" @row-count)) (info (format "\t\t\tCompleted in %s seconds" (float (/ (- (System/currentTimeMillis) start) 1000)))) (info (format "\t\t\tAverage rows/s %.2f" (rate-calc @row-count))) nil) ) 有关。

新(工作)代码如下:

(.freeMemory (java.lang.Runtime/getRuntime))

我试验的其他事情(成效有限)涉及音色记录和关闭标准;我想知道如果使用REPL它可能会在显示回我的编辑器(vim壁炉)之前缓存结果,我不确定这是否利用了大量的内存。

此外,我使用{{1}}在内存中添加了记录部分。我对VisualVM并不熟悉并准确指出我的问题所在。

我很高兴现在的工作方式,感谢大家的帮助。

答案 1 :(得分:3)

您可以将prepare-statement:fetch-size选项一起使用。否则,尽管结果以惰性序列传递,但查询本身仍然很渴望。

prepare-statement需要连接对象,因此您需要显式创建一个。以下是您的使用情况的示例:

(let [db-spec    local-postgres
      sql        "select * from big_table limit 500000 "
      fetch-size 10000 ;; or whatever's appropriate
      cnxn       (doto (j/get-connection db-spec)
                   (.setAutoCommit false))
      stmt       (j/prepare-statement cnxn sql :fetch-size fetch-size)
      results    (rest (j/query cnxn [stmt]))]
  ;; ...
  )

另一个选项

由于问题似乎与query有关,请尝试with-query-results。它被认为已被弃用但仍然存在且有效。以下是一个示例用法:

(let [db-spec    local-postgres
      sql        "select * from big_table limit 500000 "
      fetch-size 100 ;; or whatever's appropriate
      cnxn       (doto (j/get-connection db-spec)
                   (.setAutoCommit false))
      stmt       (j/prepare-statement cnxn sql :fetch-size fetch-size)]
  (j/with-query-results results [stmt] ;; binds the results to `results`
    (doseq [row results]
      ;;
      )))

答案 2 :(得分:2)

我已经找到了更好的解决方案:您需要在事务中声明游标并从中获取数据块。例如:

  (db/with-tx
    (db/execute! "declare cur cursor for select * from huge_table")
    (loop []
      (when-let [rows (-> "fetch 10 from cur" db/query not-empty)]
        (doseq [row rows]
          (process-a-row row))
        (recur))))

此处,db/with-txdb/execute!db/querydb命名空间中声明的自己的快捷方式:

(def ^:dynamic
  *db* {:dbtype "postgresql"
        :connection-uri <some db url>)})

(defn query [& args]
  (apply jdbc/query *db* args))

(defn execute! [& args]
  (apply jdbc/execute! *db* args))

(defmacro with-tx
  "Runs a series of queries into transaction."
  [& body]
  `(jdbc/with-db-transaction [tx# *db*]
     (binding [*db* tx#]
       ~@body)))