Question

我正在尝试从某些谷歌picasa xml获取一些数据，并且遇到了一些麻烦..

这是实际的xml（只包含一个条目）： http://pastie.org/1736008

基本上，我想收集一些gphoto属性，理想情况下我想做的是：

doc.xpath('//entry').map do |entry|
  {:id => entry.children['gphoto:id'],
   :thumb => entry.children['gphoto:thumbnail'],
   :name => entry.children['gphoto:name'],
   :count => entry.children['gphoto:numphotos']}
end

然而，这不起作用......事实上，当我检查入门的孩子时，我甚至看不到任何'gphoto：xxx'所有人都有...所以我很困惑如何找到它们。

谢谢！

Answer 1

这是一些使用nokogiri从你的示例xml中提取gphoto元素的工作代码。

#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
content = File.read('input.xml')
doc = Nokogiri::XML(content) {|config| 
          config.options = Nokogiri::XML::ParseOptions::STRICT
      }

hashes = doc.xpath('//xmlns:entry').map do |entry|
  {
    :id => entry.xpath('gphoto:id').inner_text,
    :thumb => entry.parent.xpath('gphoto:thumbnail').inner_text,
    :name => entry.xpath('gphoto:name').inner_text,
    :count => entry.xpath('gphoto:numphotos').inner_text
  }
end

puts hashes.inspect

# yields: 
#
# [{:count=>"37", :name=>"Melody19Months", :thumb=>"http://lh3.ggpht.com/_Viv8WkAChHU/AAAAAAAAAAA/AAAAAAAAAAA/pNuu5PgnP1Y/s64-c/soopingsaw.jpg", :id=>"5582695833628950881"}]

注意：

您的要点中的示例xml需要关闭“Feed”标记。已修复here。
在xpath表达式中找到条目元素，我们必须使用名称空间前缀，所以“xmlns：entry”，而不仅仅是“entry”。后者（在原始代码中使用）将找到 no 元素。它在null命名空间中查找元素，但在您的示例中，它们都继承了在feed元素上指定的默认命名空间。 Aaron Patterson写了一篇（以Nokogiri为中心）的问题简介here，还有另一篇here。
元素gphoto：thumbnail是 feed 元素的子元素，每个条目的不是。我为此做了一个小的（hacky）调整，保留了原始示例的设计，但是每个feed只查询一次这个元素的值会更好远（也许以后填充）如果他们真的需要每个人保留一份副本，那么该条目就是哈希。）
实际上并不需要将Nokogiri配置为严格，但很高兴能够及早发现问题。

Answer 2

您可以搜索entry个节点，然后查看每个节点以提取gphoto命名空间节点：

require 'nokogiri'

doc = Nokogiri::XML(open('./test.xml'))
hashes = doc.search('//xmlns:entry').map do |entry|
  h = {}
  entry.search("*[namespace-uri()='http://schemas.google.com/photos/2007']").each do |gphoto|
    h[gphoto.name] = gphoto.text
  end
  h
end

require 'ap'
ap hashes
# >> [
# >>     [0] {
# >>                        "id" => "5582695833628950881",
# >>                      "name" => "Melody19Months",
# >>                  "location" => "",
# >>                    "access" => "public",
# >>                 "timestamp" => "1299649559000",
# >>                 "numphotos" => "37",
# >>                      "user" => "soopingsaw",
# >>                  "nickname" => "sooping",
# >>         "commentingEnabled" => "true",
# >>              "commentCount" => "0"
# >>     }
# >> ]

返回所有//entry/gphoto:*笔记。如果你只想要某些，你可以过滤你想要的东西：

require 'nokogiri'

doc = Nokogiri::XML(open('./test.xml'))
hashes = doc.search('//xmlns:entry').map do |entry|
  h = {}
  entry.search("*[namespace-uri()='http://schemas.google.com/photos/2007']").each do |gphoto|
    h[gphoto.name] = gphoto.text if (%w[id thumbnail name numphotos].include?(gphoto.name))
  end
  h
end

require 'ap'
ap hashes

# >> [
# >>     [0] {
# >>                "id" => "5582695833628950881",
# >>              "name" => "Melody19Months",
# >>         "numphotos" => "37"
# >>     }
# >> ]

请注意，在原始问题中，尝试访问gphoto:thumbnail，但//element/gphoto:thumbnails没有匹配的节点，因此无法找到它。

使用命名空间编写搜索的另一种方法是：

require 'nokogiri'

doc = Nokogiri::XML(open('./test.xml'))
hashes = doc.search('//xmlns:entry').map do |entry|
  h = {}
  entry.search("*").each do |gphoto|
    h[gphoto.name] = gphoto.text if (
      (gphoto.namespace.prefix=='gphoto') && 
      (%w[id thumbnail name numphotos].include?(gphoto.name))
    )
  end
  h
end

require 'ap'
ap hashes

# >> [
# >>     [0] {
# >>                "id" => "5582695833628950881",
# >>              "name" => "Melody19Months",
# >>         "numphotos" => "37"
# >>     }
# >> ]

不是使用XPath，而是要求Nokogiri查看每个节点的命名空间属性。

使用nokogiri解析google picasa api xml - 命名空间问题？

2 个答案: