在clojure中解析XML

时间:2011-06-24 15:22:11

标签: xml clojure

我是clojure的新手所以请耐心等待。我有一个看起来像这样的XML

<?xml version="1.0" encoding="UTF-8"?>
<XVar Id="cdx9" Type="Dictionary">
  <XVar Id="Base.AccruedPremium" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="0"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.IndexDuration" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="3.4380728252313069"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.IndexLevel01" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="30693.926279941188"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.TrancheDelta" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="8.9304387917502073"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.TrancheDuration" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="3.0775955481964035"/>
    </Row>
  </XVar>
</XVar>

重复一遍。由此我希望能够生成包含这些列的CSV文件

IndexName,TrancheAnalysis.IndexDuration,TrancheAnalysis.TrancheDuration
cdx9,3.4380728252313069,3.0775955481964035
.........................................
.........................................

我能够解析像

这样的简单XML文件
<?xml version="1.0" encoding="UTF-8"?>
<CalibrationData>
  <IndexList>
    <Index>
      <Calibrate>Y</Calibrate>
      <UseClientIndexQuotes>Y</UseClientIndexQuotes>
      <IndexName>HYCDX10</IndexName>
      <Tenor>06/20/2013</Tenor>
      <TenorName>3Y</TenorName>
      <IndexLevels>219.6</IndexLevels>
      <Tranche>Equity0To0.15</Tranche>
      <TrancheStart>0</TrancheStart>
      <TrancheEnd>0.15</TrancheEnd>
      <UseBreakEvenSpread>1</UseBreakEvenSpread>
      <UseTlet>0</UseTlet>
      <IsTlet>0</IsTlet>
      <PctExpectedLoss>0</PctExpectedLoss>
      <UpfrontFee>52.125</UpfrontFee>
      <RunningFee>0</RunningFee>
      <DeltaFee>5.3</DeltaFee>
      <CentralCorrelation>0.1</CentralCorrelation>
      <Currency>USD</Currency>
      <RescalingMethod>PTIndexRescaling</RescalingMethod>
      <EffectiveDate>06/17/2011</EffectiveDate>
    </Index>
  </IndexList>
</CalibrationData>

使用此代码

(ns DynamicProgramming
  (:require [clojure.xml :as xml]))
;Get the Input Files
(def calibrationFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/CalibrationQuotes.xml")
(def mktdataFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/MarketData.xml")
(def sample "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/Sample.xml")

;Parse the Calibration Input File
    (def CalibOp (for [x 
                  (xml-seq 
                    (xml/parse (java.io.File. calibrationFile)))
          :when (or 
                  (= :IndexName (:tag x)) 
                  (= :Tenor (:tag x))
                  (= :UpfrontFee (:tag x))
                  (= :RunningFee (:tag x))
                  (= :DeltaFee (:tag x))
                  (= :IndexLevels (:tag x))
                  (= :TrancheStart (:tag x))
                  (= :TrancheEnd (:tag x))
                 )]
    (first(:content x))))
    (println  CalibOp)

但第二个XML很简单;另一方面,我不知道如何遍历第一个XML示例的嵌套结构并提取我想要的信息。

任何帮助都会很棒。

1 个答案:

答案 0 :(得分:8)

我会使用data.zip(以前的clojure.contrib.zip-filter)。它提供了大量的xml解析功能,并且很容易执行类似xpath的表达式。 README将其描述为用于过滤树的系统,特别是XML树

下面我有一些示例代码,用于为CSV文件创建“行”。该行是列名称到属性值的映射。

(ns work 
    (:require [clojure.xml :as xml]
              [clojure.zip :as zip]
              [clojure.contrib.zip-filter.xml :as zf]))

; create a zip from the xml file
(def zip (zip/xml-zip (xml/parse "data.xml")))

; pulls out a list of all of the root "Id" attribute values
(zf/xml-> zip (zf/attr :Id))

(defn value [xvar-zip]
  "Finds the id and value for a particular element"
  (let [id (-> xvar-zip zip/node :attrs :Id) ; manual access
        value (zf/xml1-> xvar-zip ; use xpath like expression to pull value out
                         :Row ; need the row element
                         :Col ; then the column element
                         (zf/attr :Value))] ; and finally pull the Value out
    {id value}))

; gets the "column-value" pair for a single column
(zf/xml1-> zip
           (zf/attr= :Id "cdx9") ; filter on id "cdx9" 
           :XVar ; filter on XVars under it 
           (zf/attr= :Id "TrancheAnalysis.IndexDuration") ; filter on id
           value) ; apply the value function on the result of above

; creates a map of every column key to it's corresponding value
(apply merge (zf/xml-> zip (zf/attr= :Id "cdx9") :XVar value))

我不确定xml如何与多个Dictionary XVar一起使用,因为它是一个根元素。如果需要,对此类工作有用的其他函数之一是mapcat,其中cat是映射函数返回的所有值。

test source中还有更多示例。

我的另一个重要建议是确保使用许多小功能。您会发现调试,测试和使用起来更容易。