将父节点拆分为给定节点的两个兄弟节点

时间:2014-01-18 10:11:14

标签: ruby nokogiri

我有一个XML(下面的示例),我需要在某个子节点上将一个节点拆分为两个

<?xml version="1.0" encoding="UTF-8"?>
<gpx>
  <trk>
    <trkseg>
      <trkpt>
        <time>2014-01-16T14:33:35.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T14:33:39.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T15:44:14.000Z</time>
      </trkpt>
    </trkseg>
  </trk>
</gpx>

这是生成的XML

<?xml version="1.0" encoding="UTF-8"?>
<gpx>
  <trk>
    <trkseg>
      <trkpt>
        <time>2014-01-16T14:33:35.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T14:33:39.000Z</time>
      </trkpt>
    </trkseg>   <-- this line is new
    <trkseg>    <-- this line is new
      <trkpt>
        <time>2014-01-16T15:44:14.000Z</time>
      </trkpt>
    </trkseg>
  </trk>
</gpx>

这个XML有些固定,实际上有数千个trkpt。

我使用Nokogiri找到分割的位置没有问题,但我不知道如何进行分割。

2 个答案:

答案 0 :(得分:1)

如果您根据解析数据结构的节点而不是文本XML元素进行思考,您可能会发现这更容易。

在这种情况下,您希望在第一个节点之后添加新的trkseg节点,然后删除最后一个trkpt节点并将其移动到此新节点。这样的事情应该有效:

d = Nokogiri.XML(the_original_xml)

# find the node to move and remove it from its current position
trkpt3 = d.at_xpath("//trkpt[3]")
trkpt3.remove

# create a new node of type trkseg
new_node = d.create_element("trkseg")

# add the trkpt3 node to this new node
new_node.add_child(trkpt3)

# add the new node into position as a child of the trk node
d.at_xpath("//trk").add_child(new_node)

这个的实际结果与你所追求的并不完全相同,因为它没有考虑空白节点,但结构是相同的 - 它看起来像这样:

<?xml version="1.0" encoding="UTF-8"?>
<gpx>
  <trk>
    <trkseg>
      <trkpt>
        <time>2014-01-16T14:33:35.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T14:33:39.000Z</time>
      </trkpt>

    </trkseg>
  <trkseg><trkpt>
        <time>2014-01-16T15:44:14.000Z</time>
      </trkpt></trkseg></trk>
</gpx>

如果重要的是,您可以更精确地重建文档以获得所需的结果。

在实际情况下,您可能需要不同的XPath查询,但使用removeadd_child<<,{{等方法操纵DOM结构的一般想法3}},create_element就是你所需要的。


通用方法

以下是可用于将节点拆分为的方法的示例,拆分在作为参数传入的节点之后:

def split_after(node)
  # Find all the nodes to be moved to the new node
  to_move = node.xpath("following-sibling::node()")
  # The parent node, this is the node that will be "split"
  p = node.parent

  # Create the new node
  new_node = node.document.create_element(p.name)

  # Remove the nodes from the original position
  # and add them to the new node
  to_move.remove
  new_node << to_move

  # Insert the new node into the correct position
  p.add_next_sibling(new_node)
end

这使用create_text_node,当被分割的节点本身有兄弟姐妹时,它确保将新节点添加到正确的位置。

答案 1 :(得分:0)

我会这样做:

require 'nokogiri'

doc_string = <<-xml
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
  <trk>
    <trkseg>
      <trkpt>
        <time>2014-01-16T14:33:35.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T14:33:39.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T15:44:14.000Z</time>
      </trkpt>
    </trkseg>
  </trk>
</gpx>
xml

doc = Nokogiri.XML(doc_string) do |config|
  config.default_xml.noblanks
end

# First I find the node, onto which I would split as an example.
split_node = doc.at("//trkpt[last()]")

# I took out the parent node of the node onto which I will split later.
parent_node_of_split_node = split_node.parent

# Now I am removing the splitting node from the main xml document.
split_node.unlink

# Now I am creating a new node of type <trkseg>, into which I will add splitting node
# as a child node.
new_node_to_add = Nokogiri::XML::Node.new('trkseg',doc)

# added the splitting node as a child node to the newly created node <trkseg>
new_node_to_add.add_child split_node

# below line I hope clear by seeing to the local variables names as I have written
new_node_to_add.parent = parent_node_of_split_node.parent

puts doc.to_xml(:indent => 2)

# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <gpx>
# >>   <trk>
# >>     <trkseg>
# >>       <trkpt>
# >>         <time>2014-01-16T14:33:35.000Z</time>
# >>       </trkpt>
# >>       <trkpt>
# >>         <time>2014-01-16T14:33:39.000Z</time>
# >>       </trkpt>
# >>     </trkseg>
# >>     <trkseg>
# >>       <trkpt>
# >>         <time>2014-01-16T15:44:14.000Z</time>
# >>       </trkpt>
# >>     </trkseg>
# >>   </trk>
# >> </gpx>

我在这里使用的方法: