现在我必须从XML文件中获取一些我需要的数据。数据在CDATA部分,这是很痛苦的。 数据格式如下:
<?xml version="1.0"?>
-<Calc>
<MarketData>
<![CDATA[Interestrate.CHF,Type=,Floor=<undefined>,Currency=CHF,
curve=[(1,0.1),(2,0.1)...(15,0.1)] Interestrate.EUR,Type=,Floor=<undefined>,
Currency=EUR,curve=[...] InterestRateVol.CHF,Type=,LongRunMean=0.01,..]]>
</MarketData>
<Calendars>.....
</Calendars>
</Calc>
就像我想获得Interestrate.EUR的曲线数据一样。我该怎么做R? 我需要首先找到“Interestrate.EUR”,然后在第一个“曲线”后获取数据。 处理CDATA的任何好建议?或者任何其他语言都可以解决这个问题。
答案 0 :(得分:1)
xData <- '<?xml version="1.0"?>
<Calc>
<MarketData>
<![CDATA[Interestrate.CHF,Type=,Floor=<undefined>,Currency=CHF,
curve=[(1,0.1),(2,0.1)...(15,0.1)] Interestrate.EUR,Type=,Floor=<undefined>,
Currency=EUR,curve=[...] InterestRateVol.CHF,Type=,LongRunMean=0.01,..]]>
</MarketData>
<Calendars>.....
</Calendars>
</Calc>'
library(XML)
xData <- xmlParse(xData)
cData <- xpathSApply(xData, "//text()", xmlValue)
> cData[1]
[1] "Interestrate.CHF,Type=,Floor=<undefined>,Currency=CHF,\n curve=[(1,0.1),(2,0.1)...(15,0.1)] Interestrate.EUR,Type=,Floor=<undefined>,\n Currency=EUR,curve=[...] InterestRateVol.CHF,Type=,LongRunMean=0.01,.."
out <- strsplit(cData[1], ' ')[[1]]
> out[out != ""]
[1] "Interestrate.CHF,Type=,Floor=<undefined>,Currency=CHF,\n"
[2] "curve=[(1,0.1),(2,0.1)...(15,0.1)]"
[3] "Interestrate.EUR,Type=,Floor=<undefined>,\n"
[4] "Currency=EUR,curve=[...]"
[5] "InterestRateVol.CHF,Type=,LongRunMean=0.01,.."