我正在尝试从某些谷歌picasa xml获取一些数据,并且遇到了一些麻烦..
这是实际的xml(只包含一个条目): http://pastie.org/1736008
基本上,我想收集一些gphoto属性,理想情况下我想做的是:
doc.xpath('//entry').map do |entry|
{:id => entry.children['gphoto:id'],
:thumb => entry.children['gphoto:thumbnail'],
:name => entry.children['gphoto:name'],
:count => entry.children['gphoto:numphotos']}
end
然而,这不起作用......事实上,当我检查入门的孩子时,我甚至看不到任何'gphoto:xxx'所有人都有...所以我很困惑如何找到它们。
谢谢!
答案 0 :(得分:2)
这是一些使用nokogiri从你的示例xml中提取gphoto元素的工作代码。
#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
content = File.read('input.xml')
doc = Nokogiri::XML(content) {|config|
config.options = Nokogiri::XML::ParseOptions::STRICT
}
hashes = doc.xpath('//xmlns:entry').map do |entry|
{
:id => entry.xpath('gphoto:id').inner_text,
:thumb => entry.parent.xpath('gphoto:thumbnail').inner_text,
:name => entry.xpath('gphoto:name').inner_text,
:count => entry.xpath('gphoto:numphotos').inner_text
}
end
puts hashes.inspect
# yields:
#
# [{:count=>"37", :name=>"Melody19Months", :thumb=>"http://lh3.ggpht.com/_Viv8WkAChHU/AAAAAAAAAAA/AAAAAAAAAAA/pNuu5PgnP1Y/s64-c/soopingsaw.jpg", :id=>"5582695833628950881"}]
注意:
答案 1 :(得分:0)
您可以搜索entry
个节点,然后查看每个节点以提取gphoto
命名空间节点:
require 'nokogiri'
doc = Nokogiri::XML(open('./test.xml'))
hashes = doc.search('//xmlns:entry').map do |entry|
h = {}
entry.search("*[namespace-uri()='http://schemas.google.com/photos/2007']").each do |gphoto|
h[gphoto.name] = gphoto.text
end
h
end
require 'ap'
ap hashes
# >> [
# >> [0] {
# >> "id" => "5582695833628950881",
# >> "name" => "Melody19Months",
# >> "location" => "",
# >> "access" => "public",
# >> "timestamp" => "1299649559000",
# >> "numphotos" => "37",
# >> "user" => "soopingsaw",
# >> "nickname" => "sooping",
# >> "commentingEnabled" => "true",
# >> "commentCount" => "0"
# >> }
# >> ]
返回所有//entry/gphoto:*
笔记。如果你只想要某些,你可以过滤你想要的东西:
require 'nokogiri'
doc = Nokogiri::XML(open('./test.xml'))
hashes = doc.search('//xmlns:entry').map do |entry|
h = {}
entry.search("*[namespace-uri()='http://schemas.google.com/photos/2007']").each do |gphoto|
h[gphoto.name] = gphoto.text if (%w[id thumbnail name numphotos].include?(gphoto.name))
end
h
end
require 'ap'
ap hashes
# >> [
# >> [0] {
# >> "id" => "5582695833628950881",
# >> "name" => "Melody19Months",
# >> "numphotos" => "37"
# >> }
# >> ]
请注意,在原始问题中,尝试访问gphoto:thumbnail
,但//element/gphoto:thumbnails
没有匹配的节点,因此无法找到它。
使用命名空间编写搜索的另一种方法是:
require 'nokogiri'
doc = Nokogiri::XML(open('./test.xml'))
hashes = doc.search('//xmlns:entry').map do |entry|
h = {}
entry.search("*").each do |gphoto|
h[gphoto.name] = gphoto.text if (
(gphoto.namespace.prefix=='gphoto') &&
(%w[id thumbnail name numphotos].include?(gphoto.name))
)
end
h
end
require 'ap'
ap hashes
# >> [
# >> [0] {
# >> "id" => "5582695833628950881",
# >> "name" => "Melody19Months",
# >> "numphotos" => "37"
# >> }
# >> ]
不是使用XPath,而是要求Nokogiri查看每个节点的命名空间属性。