我正在使用clojure.data.xml
解析XML文件的异常,因为在解析完成之前流正在关闭。
我不明白为什么doall
在with-open
关闭之前没有强制评估XML数据(正如this related answer所示):
(:require [clojure.java.io :as io]
[clojure.data.xml :as xml])
(defn file->xml [path]
(with-open [rdr (-> path io/resource io/reader)]
(doall (xml/parse rdr))))
抛出异常:
(file->xml "example.xml")
;-> XMLStreamException ParseError at [row,col]:[80,1926]
Message: Stream closed com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next
如果我删除了with-open
包装器,它会按预期返回XML数据(因此虽然读者不能保证关闭,但文件是合法的。)
我看到(source xml/parse)
产生了懒惰的结果:
(defn parse
"Parses the source, which can be an
InputStream or Reader, and returns a lazy tree of Element records.
Accepts key pairs with XMLInputFactory options, see http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLInputFactory.html
and xml-input-factory-props for more information.
Defaults coalescing true."
[source & opts]
(event-tree (event-seq source opts)))
所以也许这是相关的,但我所拥有的功能与clojure.data.xml README上的“往返”示例非常相似。
我在这里缺少什么?
答案 0 :(得分:3)
我很惊讶地看到这种行为。似乎clojure.data.xml.Element
(返回类型)实现了一种"懒惰的地图"这不受doall
的影响。
这是一个将惰性值转换为法线贴图的解决方案:
(ns tst.clj.core
(:use clj.core clojure.test tupelo.test)
(:require
[tupelo.core :as t]
[clojure.string :as str]
[clojure.pprint :refer [pprint]]
[clojure.java.io :as io]
[clojure.data.xml :as xml]
[clojure.walk :refer [postwalk]]
))
(t/refer-tupelo)
(defn unlazy
[coll]
(let [unlazy-item (fn [item]
(cond
(sequential? item) (vec item)
(map? item) (into {} item)
:else item))
result (postwalk unlazy-item coll) ]
result ))
(defn file->xml [path]
(with-open [rdr (-> path io/resource io/reader) ]
(let [lazy-vals (xml/parse rdr)
eager-vals (unlazy lazy-vals) ]
eager-vals)))
(pprint (file->xml "books.xml"))
{:tag :catalog,
:attrs {},
:content
[{:tag :book,
:attrs {:id "bk101"},
:content
[{:tag :author, :attrs {}, :content ["Gambardella, Matthew"]}
{:tag :title, :attrs {}, :content ["XML Developer's Guide"]}
{:tag :genre, :attrs {}, :content ["Computer"]}
{:tag :price, :attrs {}, :content ["44.95"]}
{:tag :publish_date, :attrs {}, :content ["2000-10-01"]}
{:tag :description,
:attrs {},
:content
["An in-depth look at creating applications\n with XML."]}]}
{:tag :book,
:attrs {:id "bk102"},
:content
[{:tag :author, :attrs {}, :content ["Ralls, Kim"]}
{:tag :title, :attrs {}, :content ["Midnight Rain"]}
{:tag :genre, :attrs {}, :content ["Fantasy"]}
{:tag :price, :attrs {}, :content ["5.95"]}
{:tag :publish_date, :attrs {}, :content ["2000-12-16"]}
{:tag :description,
:attrs {},
:content
["A former architect battles corporate zombies,\n an evil sorceress, and her own childhood to become queen\n of the world."]}]}
{:tag :book,
:attrs {:id "bk103"},
:content .....
由于clojure.data.xml.Element
实现了clojure.lang.IPersistentMap
,因此使用(map? item)
会返回true。
clojure.data.xml
与clojure.xml
不同。您可能需要探索这两个库,以找到最符合您需求的库。
您还可以在需要时使用crossclj.info
查找api文档:
在我看到这个问题后大约一个星期左右,我遇到了一个XML解析问题,就像这个需要unlazy
函数的问题一样。您现在可以找到unlazy
in the Tupelo library。