Question

这是我想要实现的转换的一个例子。源XML：

<cats>
  <cat>John</cat>
  <cat>Peter</cat>
</cats>

结果：

{'cats' => ['John', 'Peter']}

即使源XML中只有一个'cats'，我希望结果哈希中<cat>的值为数组。

所以，我希望解析器应用规则：

如果节点xyzs包含一个或多个名为xyz的子节点（和没有其他节点），那么节点xyzs应该在结果散列中表示为一个数组，名称为xyzs（并且数组的每个元素应该是相应的xyz元素的内容）

以下是使用XmlSimple lib：

实现的方法

XmlSimple.xml_in('cats.xml',{:forcearray=>['cat'], :grouptags=>{"cats"=>"cat"}})

但是，我必须输入目标元素的所有名称，似乎没有其他方法可以在XmlSimple中定义forcearray / grouptags行为。

要破解预处理例程并不难，它会提取所有名称，然后将它们传递给xml_in方法，但是可能有更优雅（即已经编写）的方法吗？

（如果能够进行转换，我很乐意使用任何其他XML解析库）

UPD ：如果有问题，我的最终目标是将生成的哈希保存到MongoDB中（即整体转换为XML - ＆gt; BSON）

UPD2 ：同样，我不要想要指定应该被视为数组的元素的名称，我希望lib能够做到我

Answer 1

使用Nokogiri，我们可以编写这段代码：

require 'inflector'
require 'nokogiri'

def get_xml_stuff(xml, singular)
  plural = Inflector.pluralize(singular)
  return_hash = {plural => []}
  xml.xpath("*/#{plural}/#{singular}").each { |tag| return_hash[plural] << tag.text}
  return return_hash
end

根据我的测试，这解决了与您的XmlSimple代码匹配的简单情况。对于您的进一步要求：

如果节点xyzs包含一个或多个名为xyz（并且没有其他节点）的子节点，则节点xyzs应表示为结果哈希中的数组，名称为{ {1}}（数组的每个元素都应该是相应的xyzs元素的内容）。

xyz

如果相同的复数在文件中出现不止一次，那仍然不完美。

回答 UPD2 。我的新版本功能如下：

def get_xml_stuff(xml, singular)
  plural = Inflector.pluralize(singular)
  return_hash = {plural => []}
  path = xml.xpath("*/#{plural}/#{singular}")
  path.each { |tag| return_hash[plural] << tag.text} unless path.size != xml.xpath("*/#{plural}/*").children.size
  return return_hash
end

这里我们从复数父节点开始，如果所有已命名的子节点都具有该单数名称，则收集所有单个子节点。我的新测试代码变为：

def get_xml_stuff(xml, plural)
  singular = Inflector.singularize(plural)
  return_hash = {plural => []}
  path = xml.xpath("./#{singular}")
  path.each { |tag| return_hash[plural] << tag.text} unless path.size != xml.xpath("./*").size
  return return_hash
end

如果没有像我的示例sample_xml = Nokogiri::XML(sample_xml_text) sample_xml.children.xpath("*").each do |child| array = get_xml_stuff(child, child.name) p array end这样的标记，则以下内容应该有效：

<pets>

结束 UPD2

作为参考，我的测试是：

sample_xml = Nokogiri::XML(sample_xml_text)
array = get_xml_stuff(sample_xml.children.first, sample_xml.children.first.name)
p array

Answer 2

首先找到以s结尾的元素名称：

names = doc.search('*[name()$="s"]').map(&:name).uniq
#=> ["cats"]

其余的只是映射和散列：

Hash[names.map{|name| [name, doc.search("#{name} > #{name.sub /s$/, ''}").map(&:text)]}]
#=> {"cats"=>["John", "Peter"]}

使用Ruby中的智能标记分组进行XML解析

2 个答案: