如何使用R从CDATA部分获取我需要的数据

时间:2014-06-18 09:08:51

标签: r cdata

现在我必须从XML文件中获取一些我需要的数据。数据在CDATA部分,这是很痛苦的。 数据格式如下:

<?xml version="1.0"?>
-<Calc>
   <MarketData>
      <![CDATA[Interestrate.CHF,Type=,Floor=<undefined>,Currency=CHF,
       curve=[(1,0.1),(2,0.1)...(15,0.1)] Interestrate.EUR,Type=,Floor=<undefined>,
       Currency=EUR,curve=[...] InterestRateVol.CHF,Type=,LongRunMean=0.01,..]]>
   </MarketData>
   <Calendars>.....
   </Calendars>
</Calc>

就像我想获得Interestrate.EUR的曲线数据一样。我该怎么做R? 我需要首先找到“Interestrate.EUR”,然后在第一个“曲线”后获取数据。 处理CDATA的任何好建议?或者任何其他语言都可以解决这个问题。

1 个答案:

答案 0 :(得分:1)

xData <- '<?xml version="1.0"?>
<Calc>
  <MarketData>
  <![CDATA[Interestrate.CHF,Type=,Floor=<undefined>,Currency=CHF,
           curve=[(1,0.1),(2,0.1)...(15,0.1)] Interestrate.EUR,Type=,Floor=<undefined>,
           Currency=EUR,curve=[...] InterestRateVol.CHF,Type=,LongRunMean=0.01,..]]>
  </MarketData>
  <Calendars>.....
</Calendars>
  </Calc>'
library(XML)
xData <- xmlParse(xData)
cData <- xpathSApply(xData, "//text()", xmlValue)
> cData[1]
[1] "Interestrate.CHF,Type=,Floor=<undefined>,Currency=CHF,\n           curve=[(1,0.1),(2,0.1)...(15,0.1)] Interestrate.EUR,Type=,Floor=<undefined>,\n           Currency=EUR,curve=[...] InterestRateVol.CHF,Type=,LongRunMean=0.01,.."


out <- strsplit(cData[1], ' ')[[1]] 

> out[out != ""]
[1] "Interestrate.CHF,Type=,Floor=<undefined>,Currency=CHF,\n"
[2] "curve=[(1,0.1),(2,0.1)...(15,0.1)]"                      
[3] "Interestrate.EUR,Type=,Floor=<undefined>,\n"             
[4] "Currency=EUR,curve=[...]"                                
[5] "InterestRateVol.CHF,Type=,LongRunMean=0.01,.."