使用xslt从xml中剥离CData

时间:2016-11-22 10:19:15

标签: c# xml xslt

我正在使用xslt从以下xslt中提取数据。无论如何要剥离CData。目前它还包括CData,当它被提取时。

<Product>
<ExternalId><![CData[55037]]></ExternalId>
<Name><![CData[Reindeer Booties]]></Name>
<Description><![CData[Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.]]></Description>
<Brand>XYZ</Brand>
<CategoryExternalId>1_15_1</CategoryExternalId>
<ProductPageUrl><![CData[http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties]]></ProductPageUrl>
<ImageUrl><![CData[http://www.xyzimages.com/images/product/16S_550.jpg]]></ImageUrl>
<SwatchImageUrl><![CData[]]></SwatchImageUrl>
<Price>84.8000</Price>
<Wasprice>154.9500</Wasprice>
<ManufacturerPartNumber></ManufacturerPartNumber>
<EAN></EAN>
<Colours><![CData[blue-pink]]</Colours>
</Product>

我期待以下输出

<Product>
<ExternalId>55037</ExternalId>
<Name>Reindeer Booties></Name>
<Description>Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.</Description>
<Brand>XYZ</Brand>
<CategoryExternalId>1_15_1</CategoryExternalId>
<ProductPageUrl>http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties</ProductPageUrl>
<ImageUrl>http://www.xyzimages.com/images/product/16S_550.jpg</ImageUrl>
<SwatchImageUrl></SwatchImageUrl>
<Price>84.8000</Price>
<Wasprice>154.9500</Wasprice>
<ManufacturerPartNumber></ManufacturerPartNumber>
<EAN></EAN>
<Colours>blue-pink</Colours>
</Product>

3 个答案:

答案 0 :(得分:0)

您真正的问题是您已损坏xml并应修复错误的来源,而不是修补结果。 CData不应位于尖括号标记中。它应该以'!'开头并以']'结尾。以下正则表达式将修复错误。

using System.Xml;
using System.Xml.Linq;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication28
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            string xml = File.ReadAllText(FILENAME);
            string pattern = @"(?'open'<)(?'cdata'!\[CData[^\>]+)(?'close'>)";
            string fixedXml = Regex.Replace(xml, pattern, "${cdata}");
            XDocument doc = XDocument.Parse(fixedXml);
        }
    }
}

答案 1 :(得分:0)

您向我们展示的输入不是格式良好的XML,并且无法由XSLT处理:

  • 首先,CDATA sections必须以<![CDATA[开头,而不是。{ <![CData[就像你拥有它一样(XML区分大小写)。

  • 接下来,CDATA部分必须以]]>结尾。这个结局缺失了 您输入的第14行(您只有]]

修复这些缺陷后,可以使用格式良好的XML输入,例如:

<强> XML

<Product>
    <ExternalId><![CDATA[55037]]></ExternalId>
    <Name><![CDATA[Reindeer Booties]]></Name>
    <Description><![CDATA[Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.]]></Description>
    <Brand>XYZ</Brand>
    <CategoryExternalId>1_15_1</CategoryExternalId>
    <ProductPageUrl><![CDATA[http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties]]></ProductPageUrl>
    <ImageUrl><![CDATA[http://www.xyzimages.com/images/product/16S_550.jpg]]></ImageUrl>
    <SwatchImageUrl><![CDATA[]]></SwatchImageUrl>
    <Price>84.8000</Price>
    <Wasprice>154.9500</Wasprice>
    <ManufacturerPartNumber></ManufacturerPartNumber>
    <EAN></EAN>
    <Colours><![CDATA[blue-pink]]></Colours>
</Product>
然后,您可以应用一个简单的,仅限身份转换的样式表:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

返回:

<强>结果

<?xml version="1.0" encoding="UTF-8"?>
<Product>
   <ExternalId>550&lt;37</ExternalId>
   <Name>Reindeer Booties</Name>
   <Description>Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.</Description>
   <Brand>XYZ</Brand>
   <CategoryExternalId>1_15_1</CategoryExternalId>
   <ProductPageUrl>http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties</ProductPageUrl>
   <ImageUrl>http://www.xyzimages.com/images/product/16S_550.jpg</ImageUrl>
   <SwatchImageUrl/>
   <Price>84.8000</Price>
   <Wasprice>154.9500</Wasprice>
   <ManufacturerPartNumber/>
   <EAN/>
   <Colours>blue-pink</Colours>
</Product>

答案 2 :(得分:0)

由于您使用的是C#,因此您可以完全不使用XSLT,只需使用LINQ to XML。

var doc = XDocument.Load("test.xml");

foreach (var n in doc.DescendantNodes().OfType<XCData>().ToList())
{
    n.ReplaceWith(n.Value);
}

doc.Save("test2.xml");

当然,正如 michael.hor257k 所指出的,您的输入XML应该很好地形成。