Question

我有一个有效的XHTML文件（100兆字节的数据）和一个大表。第一个tr是列（用于数据库），所有其他tr是数据。它是整个文档中唯一的表，它在结构html-＆gt; body-＆gt; div-＆gt; table。

如何在Clojure中解析它的懒惰方式？

我知道data.xml但是因为我是Clj的初学者，所以我很难让它发挥作用。特别是因为在使用如此大的文件时REPL非常慢。

Answer 1

data.xml docs说它会创建一个文档的懒树：parse。我在当地检查过，这似乎是真的：

; Load libs
(require '[clojure.data.xml :as xml])
(require '[clojure.java.io :as io])

; standard.xml is 100MB xml file from here http://www.xml-benchmark.org/downloads.html
(def xml-tree (xml/parse (io/reader "standard.xml")))
(:tag xml-tree) => :site

(def child (first (:content xml-tree)))
(:tag child) => :regions

(dorun (:content xml-tree)) => REPL hangs for ~30 seconds on my computer because it tries to parse whole file

我如何懒惰地解析Clojure中的大XHTML文件？

1 个答案: