如何使用clojure.data.xml删除空的xml标签?

时间:2018-12-05 02:11:43

标签: xml clojure

给出一个命名空间的xml(在此示例中被忽略)

<foo>
    <name>John</name>
    <address>1 hacker way</address>
    <phone></phone>
    <school>
        <name></name>
        <state></state>
        <type></type>
    </school>
    <college>
        <name>mit</name>
        <address></address>
        <state></state>
    </college>
</foo>

如何用clojure.data.xmlremove-empty-tags编写函数以返回以下内容?

<foo>
  <name>John</name>
  <address>1 hacker way</address>
  <college> 
    <name>mit</name>
  </college>
</foo>

到目前为止,我的解决方案还不完善,看起来有些递归可能会有所帮助:

(require '[clojure.data.xml :as xml])

(defn- child-element? [e]
  (let [content (:content e)]
    (and (= (count content)
            (count (filter #(instance? clojure.data.xml.node.Element %) content))))))


(defn remove-empty-tags
  [xml-data]
  (let [empty-tags? #(or (empty? %) (-> % .toString blank?))]
    (reduce (fn [col e]
               (if-not (empty-tags? (:content e))
                 (merge col e)
                  col)))
            xml-data))

(def body (slurp "sample.xml")) ;; the above xml
(def xml-data (-> (xml/parse (java.io.StringReader. body)) :content))

(remove-empty-tags xml-data)

在转换为xml后返回:

<foo>
    <name>John</name>
    <address>1 hacker way</address>
    <school>
        <name/>
        <state/>
    </school>
    <college>
        <name>mit</name>
        <address/>
        <state/>
    </college>
</foo>

很明显,此功能需要递归才能使用child-element?删除空的子节点。

建议?

3 个答案:

答案 0 :(得分:1)

这是使用clojure.walk/postwalk的非常简单的解决方案:

(defn remove-empty-elements [xml-data]
  (clojure.walk/postwalk
   (fn [v]
     (cond
       (and (instance? clojure.data.xml.Element v)
            (every? empty? (:content v)))
       nil ;; nil-out elements with no content
       (instance? clojure.data.xml.Element v)
       (update v :content #(filter some? %)) ;; filter nils from contents
       :else v))
   xml-data))

这是通过深度优先遍历XML数据,将没有:content的元素替换为nil,然后将这些nil从其他元素的:content集合中过滤出来而实现的。

请注意:如果您只是发出字符串,则可以省略(instance? clojure.data.xml.Element v)中的第二个cond子句,因为xml/emit-str会忽略:content集合中的nil两种方式都发出相同的字符串。

(println (xml/emit-str (remove-empty-elements xml-data)))

格式化输出:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
    <name>John</name>
    <address>1 hacker way</address>
    <college>
        <name>mit</name>
    </college>
</foo>

答案 1 :(得分:1)

您可以使用the Tupelo Forest library轻松地操纵类似树的数据结构。这是a video from the 2017 Clojure Conj的简介。对于您的问题:

{
  data: [
    {
        myField: 'someValue'
    },
    {
        myField: 'someOtherValue'
    },
  ],
}

我们将xml数据添加到新目录林中,并删除所有空白节点:

if ( isset( $_GET['submit'] ) ){
$INPUT_STATE=$_GET['INPUT_STATE'];
$SHIP_FROM_STATE = $_GET['SHIP_FROM_STATE'];
$SHIP_FROM_ADDRESS= $_GET['SHIP_FROM_ADDRESS'];
$SHIP_FROM_CITY= $_GET['SHIP_FROM_CITY'];
$SHIP_FROM_ZIP= $_GET['SHIP_FROM_ZIP'];
if ($SHIP_FROM_STATE=='CA');

require_once('C:\xampp\htdocs\addemp\Tax\nusoap.php');
$wsdl = 'http://services.gis.boe.ca.gov/api/taxrates/rates.svc?wsdl';
$server = new SoapClient ($wsdl);
$params= SoapFunction(array(
'StreetAddress'=> $SHIP_FROM_ADDRESS,
'City'=>$SHIP_FROM_CITY,
'ZipCode'=> $SHIP_FROM_ZIP));
$result = $server->call('GetRate', $params);
print_r($result);

}ELSE

结果:

  (let [xml-data "<foo>
                  <name>John</name>
                  <address>1 hacker way</address>
                  <phone></phone>
                  <school>
                      <name></name>
                      <state></state>
                      <type></type>
                  </school>
                  <college>
                      <name>mit</name>
                      <address></address>
                      <state></state>
                  </college>
                </foo> "]

我们可以像这样行走树并删除空节点:

  (with-forest (new-forest)
    (let [root-hid (add-tree-xml xml-data)]
      (remove-whitespace-leaves)

结果:

(hid->hiccup root-hid) => 

    [:foo
     [:name "John"]
     [:address "1 hacker way"]
     [:phone]
     [:school [:name] [:state] [:type]]
     [:college [:name "mit"] [:address] [:state]]]

更新

实时代码can be seen here


更新#2

如果要运行代码,则需要 (walk-tree root-hid {:leave (fn [hid] (when (empty-leaf-hid? hid) (remove-hid hid)))}) 表单中的以下内容(请参见上面的实时代码示例):

(hid->hiccup root-hid) =>

     [:foo 
       [:name "John"]
       [:address "1 hacker way"]
       [:college 
        [:name "mit"]]]

答案 2 :(得分:0)

我能够结合递归和reduce来解决这个问题(我原来的部分答案,完整)。关键是递归地传递每个节点的头部,因此reduce可以将子节点的转换附加到头部。

(defn- child-element? [e]
    (let [content (:content e)]
      (and (= (count content)
              (count (filter #(instance? clojure.data.xml.node.Element %) content))))))

(defn- empty-element? [e]
  (println "empty-element" e)
  (or (empty? e) (-> e .toString blank?)))

(defn element? [e]
  (and (instance? clojure.lang.LazySeq e)
       (instance? clojure.data.xml.node.Element (first e))))

(defn remove-empty-elements!
  "Remove empty elements (and child elements) in an xml"
  [head xml-data]
  (let [data (if (seq? xml-data) xml-data (:content xml-data))
        rs (reduce (fn [col e]
              (let [content (:content e)]
                (cond
                  (empty-element? content)
                  col

                  (and (not (element? content)) (not (every? empty-element? content)))
                  (merge col e)

                  (and (element? content) (every? true? (map #(empty-element? (:content %)) content)))
                  col

                  (and (child-element? content))
                  (let [_head (xml/element (:tag e) {})]
                    (merge col (remove-empty-element! _head content)))

                  :else col)))
            []
            data)]
    (assoc head :content rs)))


;; test
(remove-empty-element! xml-data (xml/element (:tag xml-data) {}))