网上有很多关于此的条目和答案,但它们都与我需要的方向相反。从我的iTunes XML中,我有多种语言的数百%编码条目,我正在尝试使用XSLT样式表将其转换为Unicode文本。除了追踪每个角色并进行替换之外,是否还有我缺少的任何功能或过程?以下是我正在使用的各种示例的一小部分示例,第一行是XML字符串值,以下行是我正在尝试生成的基本文本,并输出到文本文件。< / p>
<string>/iTunes/iTunes%20Music/Droit%20devant/L'odysse%CC%81e.mp3</string>
/ iTunes / iTunes Music / Droit devant / L'odyssée.mp3
<string>A%CC%80%20la%20Pe%CC%82che</string>
ÀlaPêche
<string>%D0%97%D0%B0%D0%BF%D0%BE%D0%BC%D0%B8%D0%BD%D0%B0%D0%B8%CC%86</string>
Запоминай
<string>%CE%9A%CE%BF%CC%81%CF%84%CF%83%CC%8C%CE%B1%CF%81%CE%B9</string>
Κότσ̌αρι
对于某些人来说,最后一个可能无法正常显示,因为过度训练的hacek / caron。
提前感谢任何建议或线索
答案 0 :(得分:6)
纯XSLT 2.0解决方案可以使用string-to-codepoints()和codepoints-to-string()函数。 utf-8解码有点乱,可以做到。
此XSLT 2.0样式表......
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:so="http://stackoverflow.com/questions/13768754"
exclude-result-prefixes="xsl xs so">
<xsl:output encoding="UTF-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:variable name="cp-base" select="string-to-codepoints('0A')" as="xs:integer+" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<xsl:function name="so:utf8decode" as="xs:integer*">
<xsl:param name="bytes" as="xs:integer*" />
<xsl:choose>
<xsl:when test="empty($bytes)" />
<xsl:when test="$bytes[1] eq 0"><!-- The null character is not valid for XML. -->
<xsl:sequence select="so:utf8decode( remove( $bytes, 1))" />
</xsl:when>
<xsl:when test="$bytes[1] le 127">
<xsl:sequence select="$bytes[1], so:utf8decode( remove( $bytes, 1))" />
</xsl:when>
<xsl:when test="$bytes[1] lt 224">
<xsl:sequence select="
((($bytes[1] - 192) * 64) +
($bytes[2] - 128) ),
so:utf8decode( remove( remove( $bytes, 1), 1))" />
</xsl:when>
<xsl:when test="$bytes[1] lt 240">
<xsl:sequence select="
((($bytes[1] - 224) * 4096) +
(($bytes[2] - 128) * 64) +
($bytes[3] - 128) ),
so:utf8decode( remove( remove( remove( $bytes, 1), 1), 1))" />
</xsl:when>
<xsl:when test="$bytes[1] lt 248">
<xsl:sequence select="
((($bytes[1] - 224) * 262144) +
(($bytes[2] - 128) * 4096) +
(($bytes[3] - 128) * 64) +
($bytes[4] - 128) ),
so:utf8decode( $bytes[position() gt 4])" />
</xsl:when>
<xsl:otherwise>
<!-- Code-point valid for XML. -->
<xsl:sequence select="so:utf8decode( remove( $bytes, 1))" />
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:template match="string/text()">
<xsl:analyze-string select="." regex="(%[0-9A-F]{{2}})+" flags="i">
<xsl:matching-substring>
<xsl:variable name="utf8-bytes" as="xs:integer+">
<xsl:analyze-string select="." regex="%([0-9A-F]{{2}})" flags="i">
<xsl:matching-substring>
<xsl:variable name="nibble-pair" select="
for $nibble-char in string-to-codepoints( upper-case(regex-group(1))) return
if ($nibble-char ge $cp-base[2]) then
$nibble-char - $cp-base[2] + 10
else
$nibble-char - $cp-base[1]" as="xs:integer+" />
<xsl:sequence select="$nibble-pair[1] * 16 + $nibble-pair[2]" />
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:value-of select="codepoints-to-string( so:utf8decode( $utf8-bytes))" />
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
<xsl:fallback>
<!-- For XSLT 1.0 operating in forward compatibility mode,
just echo -->
<xsl:value-of select="." />
</xsl:fallback>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
...已应用于此输入...
<doc>
<string>/iTunes/iTunes%20Music/Droit%20devant/L'odysse%CC%81e.mp3</string>
<string>A%Cc%80%20la%20Pe%CC%82che</string>
<string>%D0%97%D0%B0%D0%BF%D0%BE%D0%BC%D0%B8%D0%BD%D0%B0%D0%B8%CC%86</string>
<string>%CE%9A%CE%BF%CC%81%CF%84%CF%83%CC%8C%CE%B1%CF%81%CE%B9</string>
</doc>
<强> .. ..产量强>
<doc>
<string>/iTunes/iTunes Music/Droit devant/L'odyssée.mp3</string>
<string>À la Pêche</string>
<string>Запоминай</string>
<string>Κότσ̌αρι</string>
</doc>
答案 1 :(得分:2)
这是使用java.net.URLDecoder.decode
Java方法的一个选项,但您必须升级到Saxon-PE(或EE)或降级到Saxon-B。
Saxon-B是免费的,仍然是XSLT 2.0处理器。两者都可以在这里找到:http://saxon.sourceforge.net/
示例...
XML输入
<doc>
<string>/iTunes/iTunes%20Music/Droit%20devant/L'odysse%CC%81e.mp3</string>
<string>A%CC%80%20la%20Pe%CC%82che</string>
<string>%D0%97%D0%B0%D0%BF%D0%BE%D0%BC%D0%B8%D0%BD%D0%B0%D0%B8%CC%86</string>
<string>%CE%9A%CE%BF%CC%81%CF%84%CF%83%CC%8C%CE%B1%CF%81%CE%B9</string>
</doc>
XSLT 2.0 (使用Saxon-PE 9.4和Saxon-B 9.1测试)
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:java-urldecode="java.net.URLDecoder">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="string">
<xsl:value-of select="java-urldecode:decode(.,'UTF-8')"/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
<强>输出强>
/iTunes/iTunes Music/Droit devant/L'odyssée.mp3
À la Pêche
Запоминай
Κότσ̌αρι