我有一个XML新闻源,我想从中捕捉故事,以及每个故事中的一些元素。 the original xml就在这里,这是每个故事的一个例子。
<news:NewsResult>
<news:Title>Essex Police/Fire</news:Title>
<news:Url>http://www.gloucestertimes.com/local/x2118804357/Essex-Police-Fire</news:Url>
<news:Source>Gloucester Daily Times</news:Source>
<news:Snippet>ESSEX — An attempt to serve a summons to a Piper Lane resident was thwarted at 2:25 p.m. Monday when police discovered that the person no longer lives at that address. Alarms were set off in error on Belcher Street at 3:12 p.m. Monday, on Main Street at ...</news:Snippet>
到目前为止,我的代码是这样的:
def xml2Var(xmlin)
#Parse received XML with Nokogiri
doc = Nokogiri::XML(xmlin)
#Remove namespaces
doc.remove_namespaces!
#print dat ish?
# p p doc
#extract values.
title = doc.xpath("//Title")
snippet = doc.xpath("//Snippet")
url = doc.xpath("//Url")
source = doc.xpath("//Source")
我想将这些值放入每个故事的数组中。然后,将每个故事添加到故事数组中,以便我可以在我的Rails应用程序中显示它。 我设法做到了这一点,但后来无法显示每个故事和每个故事的属性。我想我对Xpath的使用是错误的?
答案 0 :(得分:4)
要将故事放入数组,您可以执行以下操作:
doc.css("NewsResult").map{|nr| [nr.at('Title'),nr.at('Snippet'),nr.at('Url'),nr.at('Source')].map(&:text)}
答案 1 :(得分:1)
鉴于您拥有四个值数组,您可以交错它们:
titles = %w[t1 t2 t3 t4]
snippets = %w[n1 n2 n3 n4]
urls = %w[u1 u2 u3 u4]
sources = %w[s1 s2 s3 s4]
pp titles.zip(snippets,urls,sources)
#=> [["t1", "n1", "u1", "s1"],
#=> ["t2", "n2", "u2", "s2"],
#=> ["t3", "n3", "u3", "s3"],
#=> ["t4", "n4", "u4", "s4"]]
然而,这可能是危险的。如果每个数组中没有完全相同的数字 - 例如,如果一个数组缺少一个数据源 - 那么它们将最终错误关联:
titles = %w[t1 t2 t3 t4]
snippets = %w[n1 n2 n3 n4]
urls = %w[u1 u2 u3 u4]
sources = %w[s1 s3 s4]
pp titles.zip(snippets,urls,sources)
#=> [["t1", "n1", "u1", "s1"],
#=> ["t2", "n2", "u2", "s3"],
#=> ["t3", "n3", "u3", "s4"],
#=> ["t4", "n4", "u4", nil]]
最好做@pguardiario建议:找到每个新闻结果,然后将其映射到其组成部分。写得更简洁:
parts = %w[Title Snippet Url Source]
all = doc.css("NewsResult").map{ |nr| parts.map{ |part| nr.at(part).text } }
这将为您提供一组四值数组,其中[0]
是标题的文本,[1]
是代码段,依此类推:
all.each do |title,snippet,url,source|
puts "Title: #{title} @ #{url} came from #{source}"
end
如果你想要一个更实用的构造,我会亲自创建一个Hash,这样我就不会通过魔术索引访问值了:
results = doc.css("NewsResult").map do |result|
Hash[ parts.map{ |part| [part.downcase.to_sym, result.at(part).text] } ]
end
#…later…
results.each do |result|
puts "Title: #{result[:title]} @ #{result[:url]} came from #{result[:source]}"
end