首先,我将XML文件解析为
(def xtest (slurp "./resources/smallXMLTest.xml"))
(def way1 (clojure.xml/parse))
(:content way1)
并且:content
哈希图中没有任何“ \ n”项。
但是当我这样解析XML时,借助clojure.data.xml
(def fr
(-> filename
io/file
io/input-stream
io/reader))
(def fileAsStream (fr XMLfilePath))
(def way2 (clojure.data.xml/parse fileAsStream))
然后我在:content
var中的每个非叶子way2
元素中,在每个内部XMLElement对之间获得了“ \ n”字符串:(
有没有办法避免这些“ \ n”字符串?
答案 0 :(得分:3)
我最近向the Tupelo library添加了2个XML解析器,一个基于clojure.data.xml
,另一个基于tagsoup
。在这两种情况下,默认情况下都删除空白节点。这是the operative function:
(defn enlive-remove-whitespace
"Removes whilespace strings from Enlive data :content vectors."
[item]
(if (and (map? item) ; Enlive data parsed from XML may has raw strings (esp. whitespace) embedded in it
(contains-key? item :tag)) ; when parsing html, may get non-enlive nodes like {:type :comment, :data "..."}
(let [content-new (cond-it-> (:content item)
(or (nil? it) (empty? it)) []
:then (drop-if (fn [arg]
(and (string? arg)
(ts/whitespace? arg))) it)
:then (mapv enlive-remove-whitespace it))]
(glue item {:content content-new}))
item))
tupelo.parse.xml
的用法如下:
(s/defn parse ; #todo fix docstring
([xml-input] (parse xml-input sax-parse-fn))
([xml-input parse-fn]
(enlive-remove-whitespace
(enlive-normalize
(parse-raw xml-input parse-fn)))))
因此,如果您不想对生成的Enlive格式的数据进行规范化或空格修剪,则可以使用the function parse-raw
。
类似的choices for parse
and parse-raw
在tupelo.parse.tagsoup
名称空间中可用。
您可以看到用法示例in the test ns:
(def xml-str "<foo>
<name>John</name>
<address>1 hacker way</address>
<phone></phone>
<school>
<name>Joe</name>
<state>CA</state>
<type>FOOBAR</type>
</school>
<college>
<name>mit</name>
<address></address>
<state>Denial</state>
</college>
</foo> ")
(def enlive-tree-normalized-nonblank
{:tag :foo,
:attrs {},
:content [{:tag :name, :attrs {}, :content ["John"]}
{:tag :address, :attrs {}, :content ["1 hacker way"]}
{:tag :phone, :attrs {}, :content []}
{:tag :school,
:attrs {},
:content [{:tag :name, :attrs {}, :content ["Joe"]}
{:tag :state, :attrs {}, :content ["CA"]}
{:tag :type, :attrs {}, :content ["FOOBAR"]}]}
{:tag :college,
:attrs {},
:content [{:tag :name, :attrs {}, :content ["mit"]}
{:tag :address, :attrs {}, :content []}
{:tag :state, :attrs {}, :content ["Denial"]}]}]})
有结果
(dotest
(let [xml-data (xml/parse (ts/string->stream xml-str))
tagsoup-data (tagsoup/parse (ts/string->stream xml-str))]
(is= enlive-tree-normalized-nonblank xml-data)
(is= enlive-tree-normalized-nonblank tagsoup-data) ))