XSLT格式化HTML输入

时间:2012-03-28 16:36:16

标签: html xml xslt

我希望使用XSLT从HTML文件中删除属性。 HTML文件如下所示:

<html>
    <head>
        <meta content="text/html; charset=UTF-8" http-equiv="Content-Type" />
        <title>CB Comfy Bike</title>
        <meta name="atg:string,index:$repositoryId" content="prod10005" />
        <meta name="atg:date:creationDate" content="955050507" />
        <meta name="atg:date:startDate" content="978325200" />
        <meta name="atg:date:endDate" content="1009861200" />
        <meta name="atg:string:$url"
            content="atgrep:/ProductCatalog/frame-product/prod10005?locale=en_US" />
        <meta name="atg:string,index:$baseUrl"
            content="atgrep:/ProductCatalog/frame-product/prod10005" />
        <meta name="atg:string:$repository.repositoryName" content="ProductCatalog" />
        <meta name="atg:string:$itemDescriptor.itemDescriptorName" content="frame-product" />
        <meta name="atg:string:childSKUs.$repositoryId" content="sku20007" />
        <meta name="atg:string:childSKUs.$itemDescriptor.itemDescriptorName" content="bike-sku" />
        <meta name="atg:date:childSKUs.creationDate" content="955068027" />
        <meta name="atg:float:childSKUs.listPrice" content="400.0" />
        <meta name="atg:float:childSKUs.salePrice" content="300.0" />
        <meta name="atg:boolean:childSKUs.onSale" content="false" />
        <meta name="atg:string:parentCategory.$repositoryId" content="cat55551" />
        <meta name="atg:date:parentCategory.creationDate" content="956950321" />
        <meta name="atg:string,docset:ancestorCategories.$repositoryId" content="cat10002" />
        <meta name="atg:string,docset:ancestorCategories.$repositoryId" content="cat10003" />
        <meta name="atg:string,docset:ancestorCategories.$repositoryId" content="cat55551" />
    </head>
    <body>
        <div class="atg:role:displayName" id="0"> CB Comfy Bike </div>
        <div class="atg:role:longDescription" id="1"> This bike is just right, whether you are a
            commuter or want to explore the fire roads. The plush front suspension will smooth out
            the roughest bumps and the big disc brakes provide extra stopping power for those big
            downhills. </div>
        <div class="atg:role:keywords" id="2"> mountain_bike comfort_bike </div>
        <div class="atg:role:childSKUs.displayName" id="3"> CB Comfy Bike Medium </div>
        <div class="atg:role:childSKUs.listPrice" id="4"> 400.0 </div>
        <div class="atg:role:childSKUs.description" id="5"> Medium </div>
        <div class="atg:role:parentCategory.displayName" id="6"> Mountain Bikes </div>
    </body>
</html>

我正在寻找每个div的新标签,我还没有专注于命名对话,因为它是概念的证明。但是我不知道如何区分div标签。到目前为止,这个XSLT是:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="xml" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="head"/>

    <xsl:template match="body">
        <xsl:copy>
            <xsl:value-of select="div/text()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

返回:

<?xml version="1.0" encoding="utf-8"?>
<html>
    <body> CB Comfy Bike </body>
</html>

我如何将输入变成这样的

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <tag1>CB Comfy Bike</tag1>
    <tag2>This bike is just right, whether you are a
        commuter or want to explore the fire roads. The plush front suspension will smooth out
        the roughest bumps and the big disc brakes provide extra stopping power for those big
        downhills.</tag2>
    <tag3>mountain_bike comfort_bike</tag3>
    <tag4>CB Comfy Bike Medium</tag4>
    <tag5>400.0</tag5>
    <tag6>Medium</tag6>
    <tag7>Mountain Bikes</tag7>
</root>

我遇到的麻烦是区分Div标签。

3 个答案:

答案 0 :(得分:0)

几乎所需的输出)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
  <xsl:output method="xml" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="head"/>

  <xsl:template match="body">
    <xsl:for-each select="div">
      <xsl:element name="{concat('tag', position())}">
        <xsl:value-of select="./text()"/>
      </xsl:element>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

答案 1 :(得分:0)

我会做类似Timur的事情,但我不会使用for-each。我会使用模板迭代正文中的div

将以下内容应用于提供的XML

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="html">
        <xsl:apply-templates select="body"/>
    </xsl:template>

    <xsl:template match="body">
        <root>
            <xsl:apply-templates select="div"/>
        </root>
    </xsl:template>

    <xsl:template match="div">
        <xsl:element name="{concat('tag', position())}">
            <xsl:value-of select="."/>
        </xsl:element>
    </xsl:template>

</xsl:stylesheet>

产生所需的输出

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <tag1> CB Comfy Bike </tag1>
    <tag2> This bike is just right, whether you are a
        commuter or want to explore the fire roads. The plush front suspension will smooth out
        the roughest bumps and the big disc brakes provide extra stopping power for those big
        downhills. </tag2>
    <tag3> mountain_bike comfort_bike </tag3>
    <tag4> CB Comfy Bike Medium </tag4>
    <tag5> 400.0 </tag5>
    <tag6> Medium </tag6>
    <tag7> Mountain Bikes </tag7>
</root>

使用:

  1. 身份转换
  2. 属性值模板
  3. apply-templates迭代子节点

答案 2 :(得分:0)

这个简短而简单的转型

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/*">
  <root>
   <xsl:apply-templates select="body/div"/>
  </root>
 </xsl:template>

 <xsl:template match="div">
  <xsl:element name="tag{position()}">
   <xsl:value-of select="."/>
  </xsl:element>
 </xsl:template>
</xsl:stylesheet>

应用于提供的XML文档

<html>
        <head>
            <meta content="text/html; charset=UTF-8" http-equiv="Content-Type" />
            <title>CB Comfy Bike</title>
            <meta name="atg:string,index:$repositoryId" content="prod10005" />
            <meta name="atg:date:creationDate" content="955050507" />
            <meta name="atg:date:startDate" content="978325200" />
            <meta name="atg:date:endDate" content="1009861200" />
            <meta name="atg:string:$url"
                content="atgrep:/ProductCatalog/frame-product/prod10005?locale=en_US" />
            <meta name="atg:string,index:$baseUrl"
                content="atgrep:/ProductCatalog/frame-product/prod10005" />
            <meta name="atg:string:$repository.repositoryName" content="ProductCatalog" />
            <meta name="atg:string:$itemDescriptor.itemDescriptorName" content="frame-product" />
            <meta name="atg:string:childSKUs.$repositoryId" content="sku20007" />
            <meta name="atg:string:childSKUs.$itemDescriptor.itemDescriptorName" content="bike-sku" />
            <meta name="atg:date:childSKUs.creationDate" content="955068027" />
            <meta name="atg:float:childSKUs.listPrice" content="400.0" />
            <meta name="atg:float:childSKUs.salePrice" content="300.0" />
            <meta name="atg:boolean:childSKUs.onSale" content="false" />
            <meta name="atg:string:parentCategory.$repositoryId" content="cat55551" />
            <meta name="atg:date:parentCategory.creationDate" content="956950321" />
            <meta name="atg:string,docset:ancestorCategories.$repositoryId" content="cat10002" />
            <meta name="atg:string,docset:ancestorCategories.$repositoryId" content="cat10003" />
            <meta name="atg:string,docset:ancestorCategories.$repositoryId" content="cat55551" />
        </head>
        <body>
            <div class="atg:role:displayName" id="0"> CB Comfy Bike </div>
            <div class="atg:role:longDescription" id="1"> This bike is just right, whether you are a
                commuter or want to explore the fire roads. The plush front suspension will smooth out
                the roughest bumps and the big disc brakes provide extra stopping power for those big
                downhills. </div>
            <div class="atg:role:keywords" id="2"> mountain_bike comfort_bike </div>
            <div class="atg:role:childSKUs.displayName" id="3"> CB Comfy Bike Medium </div>
            <div class="atg:role:childSKUs.listPrice" id="4"> 400.0 </div>
            <div class="atg:role:childSKUs.description" id="5"> Medium </div>
            <div class="atg:role:parentCategory.displayName" id="6"> Mountain Bikes </div>
        </body>
</html>

生成想要的正确结果

<root>
   <tag1> CB Comfy Bike </tag1>
   <tag2> This bike is just right, whether you are a
                commuter or want to explore the fire roads. The plush front suspension will smooth out
                the roughest bumps and the big disc brakes provide extra stopping power for those big
                downhills. </tag2>
   <tag3> mountain_bike comfort_bike </tag3>
   <tag4> CB Comfy Bike Medium </tag4>
   <tag5> 400.0 </tag5>
   <tag6> Medium </tag6>
   <tag7> Mountain Bikes </tag7>
</root>

<强>解释

  1. 正确使用模板。

  2. 使用 position() 功能。

  3. 使用 AVT (属性值模板)。