Question

这是效率而非故障排除的问题。我有以下代码段：

# The -R flag restores malformed XML
xmlstarlet -q fo -R <<<"$xml_content" | \
    # Delete xml_data
    xmlstarlet ed -d "$xml_data" | \
    # Delete index
    xmlstarlet ed -d "$xml_index" | \
    # Delete specific objects
    xmlstarlet ed -d "$xml_nodes/objects" | \
    # Append new node
    xmlstarlet ed -s "$xml_nodes" -t elem -n subnode -v "Hello World" | \
        # Add x attribute to node
        xmlstarlet ed -i "($xml_nodes)[last()]" -t attr -n x -v "0" | \
        # Add y attribute to node
        xmlstarlet ed -i "($xml_nodes)[last()]" -t attr -n y -v "0" | \
        # Add z attribute to node
        xmlstarlet ed -i "($xml_nodes)[last()]" -t attr -n z -v "1" \
            > "$output_file"

变量$xml_content包含xml内容树和
使用cat命令从大小为472.6 MB的文件解析节点。
变量$output_file如其名称所示，包含路径到输出文件。
其余变量只包含我想编辑的相应XPath。

根据帮助提出此代码的简要article，它表明：

由于xml文件被解析和写入两次，因此这有点无效。

在我的情况下，它被解析和写入两次以上（最终在loop超过1000次）。

因此，采用上述脚本，该短片段的执行时间仅为4分7秒。

假设过多，重复且可能效率低的管道与文件大小一起导致代码运行缓慢，我最终插入/删除的子节点越多，最终会导致它执行得更慢。

如果我通过重复自己或提出一个旧的，可能已经回答的话题，我可能会提前道歉，但是，我真的很想了解xmlstarlet如何使用大型XML进行详细工作文档。

更新

正如@Cyrus在之前的回答中声称的那样：

这两个xmlstarlet应该完成这项工作：

xmlstarlet -q fo -R <<<"$xml_content" |\
  xmlstarlet ed \
    -d "$xml_data" \
    -d "$xml_index" \
    -d "$xml_nodes/objects" \
    -s "$xml_nodes" -t elem -n subnode -v "Hello World" \
    -i "($xml_nodes)[last()]" -t attr -n x -v "0" \
    -i "($xml_nodes)[last()]" -t attr -n y -v "0" \
    -i "($xml_nodes)[last()]" -t attr -n z -v "1" > "$output_file"

这产生了以下错误：

-:691.84: Attribute x redefined
-:691.84: Attribute z redefined
-:495981.9: xmlSAX2Characters: huge text node: out of memory
-:495981.9: Extra content at the end of the document

老实说，我不知道这些错误是如何产生的，因为我经常更改代码测试各种场景和潜在的替代方案，但是，这就是我的诀窍：

xmlstarlet ed --omit-decl -L \
    -d "$xml_data" \
    -d "$xml_index" \
    -d "$xml_nodes/objects" \
    -s "$xml_nodes" -t elem -n subnode -v "Hello World" \
    "$temp_xml_file"

xmlstarlet ed --omit-decl -L \
    -i "($xml_nodes)[last()]" -t attr -n x -v "0" \
    -i "($xml_nodes)[last()]" -t attr -n y -v "0" \
    -i "($xml_nodes)[last()]" -t attr -n z -v "1" \
    "$temp_xml_file"

关于插入的实际data，这就是我在开头时所拥有的：

...
<node>
    <subnode>A</subnode>
    <subnode>B</subnode>
    <objects>1</objects>
    <objects>2</objects>
    <objects>3</objects>
    ...
</node>
...

执行上面的（拆分）代码可以得到我想要的东西：

...
<node>
    <subnode>A</subnode>
    <subnode>B</subnode>
    <subnode x="0" y="0" z="1">Hello World</subnode>
</node>
...

通过拆分它们，xmlstarlet能够将attributes插入到新创建的节点中，否则它会在{{1}之前将它们添加到所选Xpath的last()实例中甚至是创建的。在某种程度上，这仍然是低效的，但是，代码现在运行不到一分钟。

以下代码，

--subnode

然而，给我这个：

xmlstarlet ed --omit-decl -L \
    -d "$xml_data" \
    -d "$xml_index" \
    -d "$xml_nodes/objects" \
    -s "$xml_nodes" -t elem -n subnode -v "Hello World" \
    -i "($xml_nodes)[last()]" -t attr -n x -v "0" \
    -i "($xml_nodes)[last()]" -t attr -n y -v "0" \
    -i "($xml_nodes)[last()]" -t attr -n z -v "1" \
    "$temp_xml_file"

通过将... <node> <subnode>A</subnode> <subnode x="0" y="0" z="1">B</subnode> <subnode>Hello World</subnode> </node> ...加入@ {2}中同样由@Cyrus回答的xmlstarlets，它会以某种方式首先添加attributes，然后创建--subnode所在的innerText 1}}是Hello World。

谁能解释为什么会发生这种奇怪的行为？

这是另一个post，其中指出＆＃34; 按顺序执行每个编辑操作＆＃34;

上面的文章准确地解释了我正在寻找的内容，但我无法让它在一个xmlstarlet ed \中完成所有工作。或者，我试过：

用($xml_nodes)[last()]

$xml_nodes[text() = 'Hello World']

使用$prev（或$xstar:prev）作为-i的参数，例如reference。 answer
-r通过attr技巧在添加--subnode后重命名临时节点

以上所有内容都会插入attributes，但不会使loop保留新元素。

注意：我在OS X El Capitan v 10.11.3上运行XMLStarlet 1.6.1

奖金

正如我在开始时提到的，我希望在这些方面使用list="$(tr -d '\r' < $names)" for name in $list; do xmlstarlet ed --omit-decl -L \ -d "$xml_data" \ -d "$xml_index" \ -d "$xml_nodes/objects" \ -s "$xml_nodes" -t elem -n subnode -v "$name" \ -i "($xml_nodes)[last()]" -t attr -n x -v "0" \ -i "($xml_nodes)[last()]" -t attr -n y -v "0" \ -i "($xml_nodes)[last()]" -t attr -n z -v "1" \ "$temp_xml_file" done之类的东西：

$list

attributes包含超过一千个不同的名称，需要与各自的--value一起添加。每个属性的loop也可能因loop而异。鉴于以上模型：

如果将属性正确添加到相应的节点，这个sed的最快和最准确的版本是什么？
在外部txt文件中创建节点列表并稍后将这些xml元素（在txt文件中）添加到另一个XML文件中会更快吗？如果有，怎么样？也许使用grep或xml？

关于最后一个问题，我指的是[Examples]之类的内容。应添加来自txt的loop的节点必须是特定的，例如，可以通过XPath选择，至少因为我只想编辑某些节点。

注意：上面的模型只是一个例子。实际--subnodes将为每个loop添加26 attr，为每个--subnode添加3或4 xmlstarlet。这就是为什么attr正确添加{{1}}而不是某些其他元素的重要性。它们必须按顺序添加。

Answer 1

为什么不使用并行（或sem），以便您可以在机器上可用的内核数量上并行执行作业？我使用的代码是解析一个包含2个变量的数组，我在本地导出只是为了确保进程是隔离的。

for array in "${listofarrays[@]}"; do
    local var1;local var2
    IFS=, read var1 var2 <<< $array
    sem -j +0
    <code goes here>
done
sem --wait

Answer 2

unbuffer可能有帮助

从expect包中取消缓冲

使用两个命令a，z来构建管道

unbuffer a | z

使用三个（或更多）命令a，b，z来构建管道
在管道内添加-p选项

unbuffer a | unbuffer -p b | z

来源：从stackexchange

被盗

使用XMLStarlet插入1000多个节点和属性 - 运行缓慢

2 个答案: