我有一个XML文件,其中包含92个制表符分隔的文本文件列表:
<?xml version="1.0" encoding="UTF-8"?>
<dumpSet>
<dump filename="file_one.txt"/>
<dump filename="file_two.txt"/>
<dump filename="file_three.txt"/>
...
</dumpSet>
每个文件中的第一行包含后续行的字段名称。这只是一个例子。元素的名称和数量将根据记录而变化。大多数将有大约50个字段名称。
Title Translated Title Watch Video Interviewee Interviewer
Interview with Barack Obama Obama, Barack Walters, Barbara
Interview with Sarah Palin Palin, Sarah Couric, Katie Smith, John
...
Oxygen XML Editor有一个Import函数,可以将文本文件转换为XML,但据我所知 - 这不能在具有多个文件的批处理过程中完成。到目前为止,批处理部分还没有出现问题。我正在使用XSLT 2.0的 unparsed-text()函数从列表中的文件中提取内容。但是,我正在努力正确地对XML输出进行分组。期望输出的示例:
<collection>
<record>
<title>Interview with Barack Obama</title>
<translatedtitle></translatedtitle>
<watchvideo></watchvideo>
<interviewee>Obama, Barack</interviewee>
<interviewer>Walters, Barbara</interviewer>
<videographer>Smith, John</videographer>
</record>
<record>
<title>Interview with Sarah Palin</title>
<translatedtitle></translatedtitle>
<watchvideo></watchvideo>
<interviewee>Palin, Sarah</interviewee>
<interviewer>Couric, Katie</interviewer>
<videographer>Smith, John</videographer>
</record>
...
</collection>
现在,这是我得到的输出类型:
<collection>
<record>
<title>title</title>
<value>Interview with Barack Obama</value>
<value>Interview with Sarah Palin</value>
<translatedtitle>translatedtitle</translatedtitle>
<value/>
<value/>
<watchvideo>watchvideo</watchvideo>
<value/>
<value/>
<interviewee>interviewee</interviewee>
<value>Obama, Barack</value>
<value>Palin, Sarah</value>
<interviewer>interviewer</interviewer>
<value>Walters, Barbara</value>
<value>Couric, Katie</value>
<videographer>videographer</videographer>
<value>Smith, John</value>
<value>Smith, John </value>
<value/>
<value/>
</record>
</collection>
也就是说,我无法按记录对输出进行分组。这是我正在使用的当前代码,基于Doug Tidwell的XSLT书中的一个例子:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="#all" version="2.0">
<xsl:param name="i" select="1"/>
<xsl:param name="increment" select="1"/>
<xsl:param name="operator" select="'<='"/>
<xsl:param name="testVal" select="100"/>
<xsl:template match="/">
<collections>
<collection>
<xsl:for-each select="dumpSet/dump">
<!-- Pull in external tab-delimited files -->
<xsl:for-each select="unparsed-text(concat('../2013-04-26/',@filename),'UTF-8')">
<record>
<!-- Call recursive template to loop through elements. -->
<xsl:call-template name="for-loop">
<xsl:with-param name="i" select="$i"/>
<xsl:with-param name="increment" select="$increment"/>
<xsl:with-param name="operator" select="$operator"/>
<xsl:with-param name="testVal" select="$testVal"/>
</xsl:call-template>
</record>
</xsl:for-each>
</xsl:for-each>
</collection>
</collections>
</xsl:template>
<xsl:template name="for-loop">
<xsl:param name="i"/>
<xsl:param name="increment"/>
<xsl:param name="operator"/>
<xsl:param name="testVal"/>
<xsl:variable name="testPassed">
<xsl:choose>
<xsl:when test="$operator = '<='">
<xsl:if test="$i <= $testVal">
<xsl:text>true</xsl:text>
</xsl:if>
</xsl:when>
</xsl:choose>
</xsl:variable>
<xsl:if test="$testPassed = 'true'">
<!-- Separate the header from the tab-delimited file. -->
<xsl:for-each select="tokenize(.,'\r|\n')[1]">
<!-- Spit out the field names. -->
<xsl:for-each select="tokenize(.,'\t')[$i]">
<xsl:element name="{replace(lower-case(translate(.,'-.','')),' ','')}">
<xsl:value-of select="replace(lower-case(translate(.,'-.','')),' ','')"/>
</xsl:element>
</xsl:for-each>
</xsl:for-each>
<!-- For the following rows, loop through the field values. -->
<xsl:for-each select="tokenize(.,'\r|\n')[position()>1]">
<xsl:for-each select="tokenize(.,'\t')[$i]">
<value>
<xsl:value-of select="."/>
</value>
</xsl:for-each>
</xsl:for-each>
<!-- Call the template to increment. -->
<xsl:call-template name="for-loop">
<xsl:with-param name="i" select="$i + $increment"/>
<xsl:with-param name="increment" select="$increment"/>
<xsl:with-param name="operator" select="$operator"/>
<xsl:with-param name="testVal" select="$testVal"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
我应该如何将其更改为按记录对输出进行分组?
答案 0 :(得分:0)
请尝试使用此XSLT来了解如何根据需要进行调整。您需要在每个需要的地方都包含您的翻译功能。
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<collections>
<collection>
<xsl:for-each select="dumpSet/dump">
<xsl:for-each select="tokenize(unparsed-text(@filename,'UTF-8'),'\n')[not(position()=1)]">
<record>
<title><xsl:value-of select="tokenize(.,'\t')[1]"/></title>
<translatedtitle><xsl:value-of select="tokenize(.,'\t')[2]"/></translatedtitle>
<watchvideo><xsl:value-of select="tokenize(.,'\t')[3]"/></watchvideo>
<interviewee><xsl:value-of select="tokenize(.,'\t')[4]"/></interviewee>
<interviewer><xsl:value-of select="tokenize(.,'\t')[5]"/></interviewer>
<videographer><xsl:value-of select="tokenize(.,'\t')[6]"/></videographer>
</record>
</xsl:for-each>
</xsl:for-each>
</collection>
</collections>
</xsl:template>
</xsl:stylesheet>
<强>输出:强>
<collections xmlns:xs="http://www.w3.org/2001/XMLSchema">
<collection>
<record>
<title>Interview with Barack Obama</title>
<translatedtitle/>
<watchvideo>Obama, Barack</watchvideo>
<interviewee>Walters, Barbara</interviewee>
<interviewer>
</interviewer>
<videographer/>
</record>
<record>
<title>Interview with Sarah Palin</title>
<translatedtitle/>
<watchvideo>Palin, Sarah</watchvideo>
<interviewee>Couric, Katie</interviewee>
<interviewer>Smith, John</interviewer>
<videographer/>
</record>
</collection>
</collections>
答案 1 :(得分:0)
如果使用xsl:analyze-string
来解析每条记录,可能会更容易。从标题中获取元素名称可能有更好的方法,但我没有时间考虑这个问题。
注意:
您可能需要更改unparsed-text()
的编码。我通常将编码作为参数传递,因此我不必修改样式表。也许编码可以添加到<dump/>
?
最好使用unparsed-text-available()
查看文件是否存在,并且可以使用指定的编码进行读取。
此外,您可能需要检查以确保标头中的值是有效的QName。例如,如果标题中有撇号,则会出错。也许最好使用标题中的字段名称作为属性值而不是元素名称。 (如:<field name="Interviewee">Obama, Barack</field>
)
这是我的例子:
XML输入
<dumpSet>
<dump filename="file_one.txt"/>
</dumpSet>
<强> file_one.txt 强>
Title Translated Title Watch Video Interviewee Interviewer Videographer
Interview with Barack Obama Obama, Barack Walters, Barbara
Interview with Sarah Palin Palin, Sarah Couric, Katie Smith, John
XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="dumpSet">
<collection>
<xsl:apply-templates select="dump[@filename]"/>
</collection>
</xsl:template>
<xsl:template match="dump">
<xsl:variable name="text" select="unparsed-text(@filename, 'iso-8859-1')"/>
<xsl:variable name="header">
<xsl:analyze-string select="$text" regex="(..*)">
<xsl:matching-substring>
<xsl:if test="position()=1">
<xsl:value-of select="regex-group(1)"/>
</xsl:if>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:variable name="headerTokens" select="tokenize($header,'\t')"/>
<xsl:analyze-string select="$text" regex="(..*)">
<xsl:matching-substring>
<xsl:if test="not(position()=1)">
<record>
<xsl:analyze-string select="." regex="([^\t][^\t]*)\t?|\t">
<xsl:matching-substring>
<xsl:variable name="pos" select="position()"/>
<xsl:element name="{replace(normalize-space(lower-case($headerTokens[$pos])),' ','')}">
<xsl:value-of select="normalize-space(regex-group(1))"/>
</xsl:element>
</xsl:matching-substring>
</xsl:analyze-string>
</record>
</xsl:if>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
<强>输出强>
<collection>
<record>
<title>Interview with Barack Obama</title>
<translatedtitle/>
<watchvideo/>
<interviewee>Obama, Barack</interviewee>
<interviewer>Walters, Barbara</interviewer>
</record>
<record>
<title>Interview with Sarah Palin</title>
<translatedtitle/>
<watchvideo/>
<interviewee>Palin, Sarah</interviewee>
<interviewer>Couric, Katie</interviewer>
<videographer>Smith, John</videographer>
</record>
</collection>