如何在Ruby中使用Nokogiri迭代XML嵌套元素

时间:2014-02-19 16:27:22

标签: ruby xml xpath iterator nokogiri

我正试图用Nokogiri迭代XML格式的文件夹结构,但我陷入了困境:

<test>
   <folder name="Folder A">
      <folder name="Folder A1">
         <file name="a.txt">Cool file</file>
      </folder>
      <folder name="Folder A2"></folder>
   </folder>
   <folder name="Folder B">
      <folder name="Folder B1"></folder>
      <folder name="Folder B2">
         <folder name="Folder B21">
            <file name="b.txt"></file>
         </folder>
   </folder>
</test>

所以,我想迭代这个以便能够创建文件夹和文件树(文件夹A1和A2在文件夹A内,文件夹B1和B2在文件夹B内,文件夹B21在文件夹B2内) )。

所以我这样做:

nodes = allnodes.xpath('//folder')
nodes.each do |node|
  puts "name => #{node.attributes['name']}"
end

但这会列出我所有的文件夹(A,A1,A2,B,B1,B2,B21)。如何才能使我不在前面的文件夹中检查更多文件夹,然后将其发送到相同的递归函数?

非常感谢您的帮助:)

4 个答案:

答案 0 :(得分:7)

当您使用带有//foo的XPath时,您会在任何级别找到foo个元素。如果您改为使用./foofoo,那么您只会找到子元素。因此:

# Given an XML node, yields the node and all <file> children
# Then recursively does the same with every <folder> child
def process_files_and_folders(node,&blk)   
  yield node, node.xpath('file')
  node.xpath('folder').each{ |folder| process_files_and_folders(folder,&blk) }
end

这个的关键是(a)递归(让所有子文件夹的方法调用本身)和(b)捕获用户使用&blk表示法传递的块,然后传递该块以后的电话。

见过:

require 'nokogiri'
doc = Nokogiri.XML(my_xml)
process_files_and_folders( doc.root ) do |folder,files|
  depth  = folder.ancestors.length-1  # Just for pretty text output indenting
  indent = "  "*depth                 # Just for pretty text output indenting
  if folder['name']
    puts "#{indent}Processing the folder named #{folder['name']}"
  else
    puts "#{indent}No folder name; probably the root element."
  end
  unless files.empty?
    puts "#{indent}There are #{files.length} files in '#{folder['name']}':"
    files.each{ |file| print indent, file['name'], "\n" }
  end
end

结果:

No folder name; probably the root element.
  Processing the folder named Folder A
    Processing the folder named Folder A1
    There are 1 files in 'Folder A1':
    a.txt
    Processing the folder named Folder A2
  Processing the folder named Folder B
    Processing the folder named Folder B1
    Processing the folder named Folder B2
      Processing the folder named Folder B21
      There are 1 files in 'Folder B21':
      b.txt

答案 1 :(得分:2)

我会这样做:

require 'nokogiri'

doc = Nokogiri::XML(<<-xml)
<test>
   <folder name="Folder A">
      <folder name="Folder A1">
         <file name="a.txt">Cool file</file>
      </folder>
      <folder name="Folder A2"></folder>
   </folder>
   <folder name="Folder B">
      <folder name="Folder B1"></folder>
      <folder name="Folder B2">
         <folder name="Folder B21">
            <file name="b.txt"></file>
         </folder>
   </folder>
</test>
xml

# Here I am collecting all folders, which has at-least one child.
parent_folders = doc.xpath("//folder").select do|folder_node|
  folder_node.xpath("./folder").size > 0
end

# Here I will iterate each parent directory, and would collect the corresponding
# sub-directories names.
parent_directory = parent_folders.each_with_object({}) do |parent_dir,dir_hash|
  dir_hash[parent_dir['name']] = parent_dir.xpath("./folder").collect do |sub_dir|
    sub_dir['name']
  end
end

parent_directory
# => {"Folder A"=>["Folder A1", "Folder A2"],
#     "Folder B"=>["Folder B1", "Folder B2", "Folder B21"],
#     "Folder B2"=>["Folder B21"]}

现在,您有一个哈希parent_directory,它维护所有目录(键)/子目录(值)关系。现在使用Hash#[]方法,您可以轻松提取给定目录的子目录。一个例子 -

parent_directory['Folder A'] # => ["Folder A1", "Folder A2"]

答案 2 :(得分:0)

有点不清楚你要做什么,但是假设你正在Linux系统上的磁盘上创建一个新的目录结构。

doc.xpath("//folder[not(folder)]").each do |f|
   path = f.xpath("ancestor-or-self::folder").map{|f| f['name']}.join("/")
   system("mkdir -p #{path}")
end

这就是它的作用:

  • 第一行找到所有最低级别的文件夹(XML中的叶节点)
  • 下一行找到所有包含文件夹的名称,并用斜杠连接它们以获得完整的“路径”。
  • 最后系统命令“mkdir -p”创建最低级别的文件夹和中间的每个文件夹。

答案 3 :(得分:0)

所以,我后来发现了如何解决它。

为了澄清,我打算有这样的功能:

def create_structure(nodeset, current_folder)
    new_folder = "#{current_folder }/#{nodeset.attributes['name']"
    Dir.makedir(new_folder)
    create_files_in_current_folder(nodeset, new_folder)
    subnodeset = nodeset.xpath('/folder')
    subnodeset.each do |node|
        create_structure(node, new_folder)
    end
end

这样我就可以将xml中的结构复制到文件系统中。

所以,至于解决方案,它就在我眼前。我不能使用“//文件夹”而是“/ folder”,因为第一个文件夹将返回所有文件夹,无论它们在xml结构中的位置如何,第二个文件只返回根目录中的文件夹。当前节点。

我希望这有助于并感谢每个人的答案。我会尽快尝试。