我有一个XML(下面的示例),我需要在某个子节点上将一个节点拆分为两个
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
<trk>
<trkseg>
<trkpt>
<time>2014-01-16T14:33:35.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T14:33:39.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T15:44:14.000Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
这是生成的XML
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
<trk>
<trkseg>
<trkpt>
<time>2014-01-16T14:33:35.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T14:33:39.000Z</time>
</trkpt>
</trkseg> <-- this line is new
<trkseg> <-- this line is new
<trkpt>
<time>2014-01-16T15:44:14.000Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
这个XML有些固定,实际上有数千个trkpt。
我使用Nokogiri找到分割的位置没有问题,但我不知道如何进行分割。
答案 0 :(得分:1)
如果您根据解析数据结构的节点而不是文本XML元素进行思考,您可能会发现这更容易。
在这种情况下,您希望在第一个节点之后添加新的trkseg
节点,然后删除最后一个trkpt
节点并将其移动到此新节点。这样的事情应该有效:
d = Nokogiri.XML(the_original_xml)
# find the node to move and remove it from its current position
trkpt3 = d.at_xpath("//trkpt[3]")
trkpt3.remove
# create a new node of type trkseg
new_node = d.create_element("trkseg")
# add the trkpt3 node to this new node
new_node.add_child(trkpt3)
# add the new node into position as a child of the trk node
d.at_xpath("//trk").add_child(new_node)
这个的实际结果与你所追求的并不完全相同,因为它没有考虑空白节点,但结构是相同的 - 它看起来像这样:
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
<trk>
<trkseg>
<trkpt>
<time>2014-01-16T14:33:35.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T14:33:39.000Z</time>
</trkpt>
</trkseg>
<trkseg><trkpt>
<time>2014-01-16T15:44:14.000Z</time>
</trkpt></trkseg></trk>
</gpx>
如果重要的是,您可以更精确地重建文档以获得所需的结果。
在实际情况下,您可能需要不同的XPath查询,但使用remove
,add_child
,<<
,{{等方法操纵DOM结构的一般想法3}},create_element
就是你所需要的。
以下是可用于将节点拆分为的方法的示例,拆分在作为参数传入的节点之后:
def split_after(node)
# Find all the nodes to be moved to the new node
to_move = node.xpath("following-sibling::node()")
# The parent node, this is the node that will be "split"
p = node.parent
# Create the new node
new_node = node.document.create_element(p.name)
# Remove the nodes from the original position
# and add them to the new node
to_move.remove
new_node << to_move
# Insert the new node into the correct position
p.add_next_sibling(new_node)
end
这使用create_text_node
,当被分割的节点本身有兄弟姐妹时,它确保将新节点添加到正确的位置。
答案 1 :(得分:0)
我会这样做:
require 'nokogiri'
doc_string = <<-xml
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
<trk>
<trkseg>
<trkpt>
<time>2014-01-16T14:33:35.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T14:33:39.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T15:44:14.000Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
xml
doc = Nokogiri.XML(doc_string) do |config|
config.default_xml.noblanks
end
# First I find the node, onto which I would split as an example.
split_node = doc.at("//trkpt[last()]")
# I took out the parent node of the node onto which I will split later.
parent_node_of_split_node = split_node.parent
# Now I am removing the splitting node from the main xml document.
split_node.unlink
# Now I am creating a new node of type <trkseg>, into which I will add splitting node
# as a child node.
new_node_to_add = Nokogiri::XML::Node.new('trkseg',doc)
# added the splitting node as a child node to the newly created node <trkseg>
new_node_to_add.add_child split_node
# below line I hope clear by seeing to the local variables names as I have written
new_node_to_add.parent = parent_node_of_split_node.parent
puts doc.to_xml(:indent => 2)
# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <gpx>
# >> <trk>
# >> <trkseg>
# >> <trkpt>
# >> <time>2014-01-16T14:33:35.000Z</time>
# >> </trkpt>
# >> <trkpt>
# >> <time>2014-01-16T14:33:39.000Z</time>
# >> </trkpt>
# >> </trkseg>
# >> <trkseg>
# >> <trkpt>
# >> <time>2014-01-16T15:44:14.000Z</time>
# >> </trkpt>
# >> </trkseg>
# >> </trk>
# >> </gpx>
我在这里使用的方法: