我尝试(用于测试目的)解析Google商家XML Feed,定义为:
<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="cs" xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0">
<link rel="alternate" type="text/html" href="http://www.example.com"/>
<link rel="self" type="application/atom+xml" href="http://www.example.com/cs/feed/google.xml"/>
<title>EasyOptic</title>
<updated>2014-08-01T16:31:11Z</updated>
<entry>
<title>Sluneční Brýle Producer 1 133a code_color_1 Color 1 133a RayBan</title>
<link href="http://www.example.com/cs/katalog/price-category-1-style-1-optical-glasses-producer-1-rayban-133a-code_color_1-color-1"/>
<summary>Moc krásný a velmi levný produkt</summary>
<updated>2014-08-01T16:31:11Z</updated>
<g:id>EO111</g:id>
<g:condition>new</g:condition>
<g:price>100 Kč</g:price>
<g:availability>in stock</g:availability>
<g:image_link>http://www.example.com/images/fallback/default.png</g:image_link>
<g:additional_image_link>http://www.example.com/images/fallback/default.png</g:additional_image_link>
<g:brand>Producer 1</g:brand>
<g:mpn>EO111</g:mpn>
<g:gender>female</g:gender>
<g:google_product_category>Apparel & Accessories > Clothing Accessories > Sunglasses</g:google_product_category>
<g:product_type>Sluneční Brýle </g:product_type>
</entry>
<entry>
<title>Sluneční Brýle Producer 1 133a code_color_1 Color 1 133a RayBan</title>
<link href="http://www.example.com/cs/katalog/price-category-1-style-1-optical-glasses-producer-1-rayban-133a-code_color_1-color-1"/>
<summary>Moc krásný a velmi levný produkt</summary>
<updated>2014-08-01T16:31:10Z</updated>
<g:id>EO111</g:id>
<g:condition>new</g:condition>
<g:price>100 Kč</g:price>
<g:availability>in stock</g:availability>
<g:image_link>http://www.example.com/images/fallback/default.png</g:image_link>
<g:additional_image_link>http://www.example.com/images/fallback/default.png</g:additional_image_link>
<g:brand>Producer 1</g:brand>
<g:mpn>EO111</g:mpn>
<g:gender>female</g:gender>
<g:google_product_category>Apparel & Accessories > Clothing Accessories > Sunglasses</g:google_product_category>
<g:product_type>Sluneční Brýle </g:product_type>
</entry>
</feed>
使用这个ruby脚本:
require 'nokogiri'
def have_node_with_children(body, path_type, path, children_names)
doc = Nokogiri::XML(body)
case path_type
when :xpath
nodes = doc.xpath(path)
when :css
nodes = doc.css(path)
else
nodes = doc.xpath(path)
end
nodes.each do |node|
nchildren_names=[]
for child in node.children
nchildren_names << child.name unless child.to_s.strip =="" #nokogiri takes formating spaces as blank node with name "text"
end
puts("demanded_nodes: #{children_names.sort.join(", ")} , nodes found: #{nchildren_names.sort.join(", ")} ")
missing = children_names - nchildren_names
over = nchildren_names - children_names
puts("Missing: #{missing.sort.join(", ")} , Over: #{over.sort.join(", ")} ")
end
end
EXPECTED_ENTRY_NODES=[
'title',
'link',
'summary',
'updated',
'g:id',
'g:condition',
'g:price',
'g:availability',
'g:image_link',
'g:additional_image_link',
'g:brand',
'g:mpn',
'g:gender',
'g:google_product_category',
'g:product_type'
]
file=File.open('google.xml')
have_node_with_children(file.read,:xpath,'//xmlns:entry',EXPECTED_ENTRY_NODES)
找到节点&#39;条目&#39; (感谢this tip)。
但是在收集它时,子方法child.name
返回没有名称空间前缀的名称(例如:<'g:brand'>.name => 'brand'
。
所以与所需领域的比较失败了。
有没有人知道用/和它的命名空间前缀来获取节点名称?
如果删除命名空间定义,一切正常,但我无法更改原始XML。 我在rspec请求测试中使用此测试,因此可能会显示另一个可能具有缩进基本节点名称的命名空间。
答案 0 :(得分:2)
xml_doc = Nokogiri::XML(xml)
xml_doc.xpath("//xmlns:entry").each do |entry|
entry.xpath("./*").each do |element| #Step through all Element nodes that are direct children of <entry>
prefix = element.namespace.prefix
puts prefix ? "#{element.namespace.prefix}:#{element.name}"
: element.name
end
break #only show output for the first <entry>
end
--output:--
title
link
summary
updated
g:id
g:condition
g:price
g:availability
g:image_link
g:additional_image_link
g:brand
g:mpn
g:gender
g:google_product_category
g:product_type
现在关于这个:
for child in node.children
一个良好接地的rubyist永远不会使用for循环...因为for_loop只调用each(),所以rubyists直接调用each():
node.children.each do |child|