我继承了一些xml文件,这些文件的所有标签都是大写的。我想使用正则表达式或通过XSLT将它们转换为小写。能够同时了解两种方式会很方便。不幸的是,我发现有时候正则表达式和XSLT语法令人困惑,但我正在研究它。 :)
(编辑:根据人为的例子添加)
在:
<?xml version="1.0"?>
<NOVEL TITLE="Now That's A Novel Title" AUTHOR="Harry Handelbar">
<PREFACE> <!-- XHTML FORMATTED TEXT -->
<P>It would be remiss of me to neglect to thank the bottle.</P>
</PREFACE>
<CHAPTER TITLE="" TYPE="NUM">
<PROLOGUE>Success, like death, marks the end of... </PROLOGUE>
<MAINTEXT> <!-- XHTML FORMATTED TEXT -->
<P>It seems a violent betrayal, me divulging how...</P>
<P>The years had not been kind Felix Lake. His constant...</P>
</MAINTEXT>
</CHAPTER>
<CHAPTER TITLE="" TYPE="NUM">
<MAINTEXT> <!-- XHTML FORMATTED TEXT -->
<P>As luck would not have it, he did.</P>
<!-- ECT ECT ECT -->
</MAINTEXT>
</CHAPTER>
</NOVEL>
后:
<?xml version="1.0"?>
<novel title="Now That's A Novel Title" author="Harry Handelbar">
<preface> <!-- XHTML FORMATTED TEXT -->
<p>It would be remiss of me to neglect to thank the bottle.</p>
</preface>
<chapter title="" type="NUM">
<prologue>Success, like death, marks the end of... </prologue>
<maintext> <!-- XHTML FORMATTED TEXT -->
<p>It seems a violent betrayal, me divulging how...</p>
<p>The years had not been kind Felix Lake. His constant...</p>
</maintext>
</chapter>
<chapter title="" type="NUM">
<maintext> <!-- XHTML FORMATTED TEXT -->
<p>As luck would not have it, he did.</p>
<!-- ECT ECT ECT -->
</maintext>
</chapter>
</novel>
希望有所帮助。
编辑:我对P标签不好 - 后面也应该是小写的)
答案 0 :(得分:1)
尝试(未经测试):
XSLT 2.0:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*">
<xsl:element name="{lower-case(local-name())}" namespace="{namespace-uri()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{lower-case(local-name())}" namespace="{namespace-uri()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<xsl:template match="comment() | text() | processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
上面的 XSLT 1.0 版本会是这样的:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
<xsl:template match="*">
<xsl:element name="{translate(local-name(), $uppercase, $lowercase)}" namespace="{namespace-uri()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{translate(local-name(), $uppercase, $lowercase)}" namespace="{namespace-uri()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<xsl:template match="comment() | text() | processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
但是,假设您的元素和属性名称不包含除明确列出的26之外的大写字符(即没有俄语,希腊语,变音符号等)。
答案 1 :(得分:0)
答案 2 :(得分:0)
通过使用PHP,您可以这样做......
<?php
$pattern= '/<\\w+|<\/\\w+/';
$fp = fopen("/Applications/XAMPP/htdocs/test/test.xml", "r") or die("can't read stdin");
while (!feof($fp)) {
$line = fgets($fp);
$line = preg_replace_callback(
$pattern,
function ($matches) {
return strtolower($matches[0]);
},
$line
);
echo htmlentities($line);
}
fclose($fp);
?>
工作正常;)
答案 3 :(得分:0)
在我看来,您可能需要2个正则表达式 - 一个用于转换标记名称,另一个用于转换可变数量的属性值对。
我是如何做到的 -
blah:tmp shreyas$ cat old.xml | perl -pe "s|(</?)([^> ]+)(.*?>)|\1\L\2\E\3|g" | perl -pe "s|(\w+)( ?= ?\".*?\")|\L\1\E\2|g" > processed.xml
blah:tmp shreyas$ diff new.xml processed.xml
4c4
< <P>It would be remiss of me to neglect to thank the bottle.</P>
---
> <p>It would be remiss of me to neglect to thank the bottle.</p>
9,10c9,10
< <P>It seems a violent betrayal, me divulging how...</P>
< <P>The years had not been kind Felix Lake. His constant...</P>
---
> <p>It seems a violent betrayal, me divulging how...</p>
> <p>The years had not been kind Felix Lake. His constant...</p>
15c15
< <P>As luck would not have it, he did.</P>
---
> <p>As luck would not have it, he did.</p>
old.xml是您的Before xml,new.xml是您的After xml。 processed.xml是该命令生成的。
如您所见,after xml中的P标记仍然是资本。我不确定他们是否是拼写错误或例外情况。因为你提到将所有标签更改为小盒子,所以我将它们视为拼写错误。
通过一些小的修改,您可以在所有继承的XML集上运行这些命令,并快速转换它们。