Question

我有大量的XML标记。

<SERVICE>
<NAME>
sh_SEET15002GetReKeyDetails
</NAME>
<ID>642</ID>
</SERVICE>

我希望以下列方式对其进行格式化。我尝试过使用xmllint，但它对我不起作用。请提供帮助。

<SERVICE>
<NAME>sh_SEET15002GetReKeyDetails</NAME>
<ID>642</ID>
</SERVICE>

Answer 1

xmllint -format -recover nonformatted.xml > formated.xml

对于标签缩进：

export XMLLINT_INDENT=`echo -e '\t'`

对于四个空格缩进：

export XMLLINT_INDENT=\ \ \ \

Answer 2

如果没有编程，您可以使用Eclipse XML Source Editor。看看这个answer

顺便问一下，你试过xmllint -format -recover nonformatted.xml > formated.xml？

修改

您可以试试XMLStarlet Command Line XML Toolkit。

5. Formatting XML documents ==================================================== xml fo --help XMLStarlet Toolkit: Format XML document Usage: xml fo [<options>] <xml-file> where <options> are -n or --noindent - do not indent -t or --indent-tab - indent output with tabulation -s or --indent-spaces <num> - indent output with <num> spaces -o or --omit-decl - omit xml declaration <?xml version="1.0"?> -R or --recover - try to recover what is parsable -D or --dropdtd - remove the DOCTYPE of the input docs -C or --nocdata - replace cdata section with text nodes -N or --nsclean - remove redundant namespace declarations -e or --encode <encoding> - output in the given encoding (utf-8, unicode...) -H or --html - input is HTML -h or --help - print help

Answer 3

在“ bluefish”中转到“编辑/首选项/外部过滤器” 添加一个新的，例如“ Tidy XML”，并放置命令“ | tidy -xml -i |” 然后在“ bluefish”中打开任何xml，然后从菜单“工具/过滤器/整洁XML”中选择它会格式化打开的xml文件

先决条件：安装bluefish并保持整洁

Answer 4

我是从gedit做的。在gedit中，您可以将任何脚本（尤其是Python脚本）添加为外部工具。该脚本从stdin读取数据并将输出写入stdout，因此它可以用作独立程序。它布局XML并对子节点进行排序。

#!/usr/bin/env python
# encoding: utf-8

"""
This is a gedit plug-in to sort and layout XML.

In gedit, to add this tool, open: menu -- Tools -- Manage External Tools...
Create a new tool: click [+] under the list of tools, type in "Sort XML" as tool name,
paste the whole text from this file in the "Edit:" box, then 
configure the tool:
Input: Current selection
Output: Replace current selection

In gedit, to run this tool,
FIRST SELECT THE XML,
then open: menu -- Tools -- External Tools > -- Sort XML

"""


from lxml import etree
import sys
import io

def headerFirst(node):
    """Return the sorting key prefix, so that 'header' will go before any other node
    """
    nodetag=('%s' % node.tag).lower()
    if nodetag.endswith('}header') or nodetag == 'header':
        return '0'
    else:
        return '1'

def get_node_key(node, attr=None):
    """Return the sorting key of an xml node
    using tag and attributes
    """
    if attr is None:
        return '%s' % node.tag + ':'.join([node.get(attr)
                                        for attr in sorted(node.attrib)])
    if attr in node.attrib:
        return '%s:%s' % (node.tag, node.get(attr))
    return '%s' % node.tag


def sort_children(node, attr=None):
    """ Sort children along tag and given attribute.
    if attr is None, sort along all attributes"""
    if not isinstance(node.tag, str):  # PYTHON 2: use basestring instead
        # not a TAG, it is comment or DATA
        # no need to sort
        return
    # sort child along attr
    node[:] = sorted(node, key=lambda child: (headerFirst(child) + get_node_key(child, attr)))
    # and recurse
    for child in node:
        sort_children(child, attr)


def sort(unsorted_stream, sorted_stream, attr=None):
    """Sort unsorted xml file and save to sorted_file"""
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse(unsorted_stream,parser=parser)
    root = tree.getroot()
    sort_children(root, attr)

    sorted_unicode = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding="UTF-8")

    sorted_stream.write('%s' % sorted_unicode)


#we could do this, 
#sort(sys.stdin, sys.stdout)
#but we want to check selection:

inputstr = ''
for line in sys.stdin:
  inputstr += line
if not inputstr:
   sys.stderr.write('no XML selected!')
   exit(100)

sort(io.BytesIO(inputstr), sys.stdout)

有两件棘手的事情：

    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse(unsorted_stream,parser=parser)

默认情况下，不会忽略空格，这可能会产生奇怪的结果。

    sorted_unicode = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding="UTF-8")

同样，默认情况下也没有漂亮的打印。

我将此工具配置为处理当前选择并替换当前选择，因为通常在同一文件YMMV中存在HTTP标头。

$ python --version
Python 2.7.6

$ lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:    14.04
Codename:   trusty

如果您不需要子节点排序，只需注释相应的行。

链接：here，here

UPDATE v2将标题放在其他任何内容之前;固定空间

如何在Linux中格式化XML文档

4 个答案: