Question

我想获取XML文件并替换元素的值。例如，如果我的XML文件如下所示：

<abc>
    <xyz>original</xyz>
</abc>

我想用另一个字符串替换xyz元素的原始值，无论它是什么，以便生成的文件如下所示：

<abc>
    <xyz>replacement</xyz>
</abc>

你会怎么做？我知道我可以编写一个Java程序来执行此操作，但我认为替换单个元素的值太过分了，并且可以使用sed使用正则表达式进行替换。然而，我对这个命令并不是新手，我希望有一种灵魂阅读，这将能够为我提供正确的正则表达式。

一个想法是做这样的事情：

sed s/\<xyz\>.*\<\\xyz\>/\<xyz\>replacement\<\\xyz\>/ <original.xml >new.xml

也许我最好用我想要的替换整个文件行，因为我会知道我想要使用的元素名称和新值？但是这假设所讨论的元素在一行上，并且没有其他XML数据在同一行上。我宁愿有一个命令，它基本上会用我指定的新字符串替换元素xyz的值，而不必担心元素是否全部在一行上等等。

如果sed不是这项工作的最佳工具，那么请给我一个更好的方法。

如果有人能指引我朝着正确的方向前进，我会非常感激，你可能会节省我数小时的试错。提前谢谢！

- 詹姆斯

Answer 1

sed不会成为用于多行替换的简单工具。可以使用它的N命令和一些递归来实现它们，如果已找到标签的关闭，则在读取每一行后进行检查......但它并不漂亮，你永远不会记住它。

当然，实际解析xml和替换标签将是最安全的事情，但如果你知道你不会遇到任何问题，你可以试试这个：

perl -p -0777 -e 's@<xyz>.*?</xyz>@<xyz>new-value</xyz>@sg' <xml-file>

打破这个局面：

-p告诉它循环输入并打印
-0777告诉它使用文件的结尾作为输入分隔符，这样它就可以在一个啜食中获得整个内容
-e意味着我希望你做的事情

替换本身：

使用@作为分隔符，因此您无需转义/
使用非贪婪版本的*?尽可能少地匹配，因此我们不会一直到文件中最后一次出现的</xyz>
使用s修饰符让.与换行符匹配（以获取多行标记值）
使用g修饰符多次匹配模式

多田！这会将结果打印到标准输出 - 一旦您确认它符合您的要求，添加-i选项以告诉它编辑文件。

Answer 2

好的，所以我咬了一口子，花时间写了一个Java程序，它做了我想要的。下面是我的main（）方法调用的操作方法，该方法可以完成工作，以防将来对其他人有帮助：

/**
 * Takes an input XML file, replaces the text value of the node specified by an XPath parameter, and writes a new
 * XML file with the updated data.
 * 
 * @param inputXmlFilePathName
 * @param outputXmlFilePathName
 * @param elementXpath
 * @param elementValue
 * @param replaceAllFoundElements
 */
public static void replaceElementValue(final String inputXmlFilePathName,
                                       final String outputXmlFilePathName,
                                       final String elementXpathExpression,
                                       final String elementValue,
                                       final boolean replaceAllFoundElements)
{
    try
    {
        // get the template XML as a W3C Document Object Model which we can later write back as a file
        InputSource inputSource = new InputSource(new FileInputStream(inputXmlFilePathName));
        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        Document document = documentBuilderFactory.newDocumentBuilder().parse(inputSource);

        // create an XPath expression to access the element's node
        XPathFactory xpathFactory = XPathFactory.newInstance();
        XPath xpath = xpathFactory.newXPath();
        XPathExpression xpathExpression = xpath.compile(elementXpathExpression);

        // get the node(s) which corresponds to the XPath expression and replace the value
        Object xpathExpressionResult = xpathExpression.evaluate(document, XPathConstants.NODESET);
        if (xpathExpressionResult == null)
        {
            throw new RuntimeException("Failed to find a node corresponding to the provided XPath.");
        }
        NodeList nodeList = (NodeList) xpathExpressionResult;
        if ((nodeList.getLength() > 1) && !replaceAllFoundElements)
        {
            throw new RuntimeException("Found multiple nodes corresponding to the provided XPath and multiple replacements not specified.");
        }
        for (int i = 0; i < nodeList.getLength(); i++)
        {
            nodeList.item(i).setTextContent(elementValue);
        }

        // prepare the DOM document for writing
        Source source = new DOMSource(document);

        // prepare the output file
        File file = new File(outputXmlFilePathName);
        Result result = new StreamResult(file);

        // write the DOM document to the file
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.transform(source, result);
    }
    catch (Exception ex)
    {
        throw new RuntimeException("Failed to replace the element value.", ex);
    }
}

我像这样运行程序：

$ java -cp xmlutility.jar com.abc.util.XmlUtility input.xml output.xml '//name/text()' JAMES

Answer 3

我讨厌成为反对者，但XML不是常规的。正则表达式可能比它的价值更麻烦。请点击此处了解更多信息：Using C# Regular expression to replace XML element content

你想到一个简单的Java程序可能会很好。如果您非常了解XSLT，那么XSLT转换可能会更容易。如果你知道Perl ......那就是去IMHO的方式。

话虽如此，如果您选择使用Regex并且您的sed版本支持扩展正则表达式，您可以使用/ g使其成为多行。换句话说，将/ g放在正则表达式的末尾，它将匹配您的模式，即使它们位于多行上。

另外。你提出的正则表达式是“贪婪的”。它会抓取最大的字符组，因为“。”将从第一次出现到最后一次匹配。您可以通过将通配符更改为“。？”来使其“懒惰”。在星号后面加上问号会告诉它只匹配一组。

Answer 4

我试图做同样的事情并遇到了实现它的[gu] awk脚本。

BEGIN { FS = "[<|>]" }
{
    if ($2 == "xyz") {
        sub($3, "replacement")      
    }
    print
}

替换XML元素的值？ sed正则表达式？

4 个答案: