Clojure XML流封闭异常

时间:2017-04-03 20:59:17

标签: xml clojure xml-parsing

我正在使用clojure.data.xml解析XML文件的异常,因为在解析完成之前流正在关闭。

我不明白为什么doallwith-open关闭之前没有强制评估XML数据(正如this related answer所示):

(:require [clojure.java.io :as io]
          [clojure.data.xml :as xml])

(defn file->xml [path] 
  (with-open [rdr (-> path io/resource io/reader)] 
    (doall (xml/parse rdr))))

抛出异常:

(file->xml "example.xml")
;-> XMLStreamException ParseError at [row,col]:[80,1926]
Message: Stream closed com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next

如果我删除了with-open包装器,它会按预期返回XML数据(因此虽然读者不能保证关闭,但文件是合法的。)

我看到(source xml/parse)产生了懒惰的结果:

(defn parse
  "Parses the source, which can be an
   InputStream or Reader, and returns a lazy tree of Element records. 
   Accepts key pairs with XMLInputFactory options, see http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLInputFactory.html
   and xml-input-factory-props for more information. 
   Defaults coalescing true."
   [source & opts]
     (event-tree (event-seq source opts)))

所以也许这是相关的,但我所拥有的功能与clojure.data.xml README上的“往返”示例非常相似。

我在这里缺少什么?

1 个答案:

答案 0 :(得分:3)

我很惊讶地看到这种行为。似乎clojure.data.xml.Element(返回类型)实现了一种"懒惰的地图"这不受doall的影响。

这是一个将惰性值转换为法线贴图的解决方案:

(ns tst.clj.core
  (:use clj.core clojure.test tupelo.test)
  (:require
    [tupelo.core :as t]
    [clojure.string :as str]
    [clojure.pprint :refer [pprint]]
    [clojure.java.io :as io]
    [clojure.data.xml :as xml]
    [clojure.walk :refer [postwalk]]
  ))
(t/refer-tupelo)

(defn unlazy
  [coll]
  (let [unlazy-item (fn [item]
                      (cond
                        (sequential? item) (vec item)
                        (map? item) (into {} item)
                        :else item))
        result    (postwalk unlazy-item coll) ]
    result ))

(defn file->xml [path]
  (with-open [rdr (-> path io/resource io/reader) ]
    (let [lazy-vals    (xml/parse rdr)
          eager-vals   (unlazy lazy-vals) ]
      eager-vals)))
(pprint (file->xml "books.xml"))

{:tag :catalog,
 :attrs {},
 :content
 [{:tag :book,
   :attrs {:id "bk101"},
   :content
   [{:tag :author, :attrs {}, :content ["Gambardella, Matthew"]}
    {:tag :title, :attrs {}, :content ["XML Developer's Guide"]}
    {:tag :genre, :attrs {}, :content ["Computer"]}
    {:tag :price, :attrs {}, :content ["44.95"]}
    {:tag :publish_date, :attrs {}, :content ["2000-10-01"]}
    {:tag :description,
     :attrs {},
     :content
     ["An in-depth look at creating applications\n      with XML."]}]}
  {:tag :book,
   :attrs {:id "bk102"},
   :content
   [{:tag :author, :attrs {}, :content ["Ralls, Kim"]}
    {:tag :title, :attrs {}, :content ["Midnight Rain"]}
    {:tag :genre, :attrs {}, :content ["Fantasy"]}
    {:tag :price, :attrs {}, :content ["5.95"]}
    {:tag :publish_date, :attrs {}, :content ["2000-12-16"]}
    {:tag :description,
     :attrs {},
     :content
     ["A former architect battles corporate zombies,\n      an evil sorceress, and her own childhood to become queen\n      of the world."]}]}
  {:tag :book,
   :attrs {:id "bk103"},
   :content .....

由于clojure.data.xml.Element实现了clojure.lang.IPersistentMap,因此使用(map? item)会返回true。

以下是sample data for books.xml

请注意:

clojure.data.xmlclojure.xml不同。您可能需要探索这两个库,以找到最符合您需求的库。

您还可以在需要时使用crossclj.info查找api文档:

更新

在我看到这个问题后大约一个星期左右,我遇到了一个XML解析问题,就像这个需要unlazy函数的问题一样。您现在可以找到unlazy in the Tupelo library