我试图找出如何从URL返回的XML中获取Make和Model并将它们放入CSV中。以下是从URL返回的XML:
<VINResult xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://basicvalues.pentondata.com/">
<Vehicles>
<Vehicle>
<ID>131497</ID>
<Product>TRUCK</Product>
<Year>1993</Year>
<Make>Freightliner</Make>
<Model>FLD12064T</Model>
<Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
</Vehicle>
<Vehicle>
<ID>131497</ID>
<Product>TRUCK</Product>
<Year>1993</Year>
<Make>Freightliner</Make>
<Model>FLD12064T</Model>
<Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
</Vehicle>
</Vehicles>
<Errors/>
<InvalidVINMsg/>
</VINResult>
这是我到目前为止的代码:
require 'csv'
require 'rubygems'
require 'nokogiri'
require 'open-uri'
vincarriercsv = 'vincarrier.csv'
vindetails = 'vindetails.csv'
vinurl = 'http://redacted/LookUp_VIN?key=redacted&vin='
CSV.open(vindetails, "wb") do |details|
CSV.foreach(vincarriercsv) do |row|
vinxml = Nokogiri::HTML(vinurl + row[1])
make = vinxml.xpath('//VINResult//Vehicles//Vehicle//Make').text
model = vinxml.xpath('//VINResult//Vehicles//Vehicle//Model').text
details << [ row[0], row[1], make, model ]
end
end
由于某种原因,URL会返回两次相同的数据,但我只需要第一个结果。到目前为止,我尝试从XML中获取Make和Model失败了......任何想法?
答案 0 :(得分:1)
以下是获取品牌和型号数据的方法。如何将其转换为CSV留给您:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<VINResult xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://basicvalues.pentondata.com/">
<Vehicles>
<Vehicle>
<ID>131497</ID>
<Product>TRUCK</Product>
<Year>1993</Year>
<Make>Freightliner</Make>
<Model>FLD12064T</Model>
<Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
</Vehicle>
<Vehicle>
<ID>131497</ID>
<Product>TRUCK</Product>
<Year>1993</Year>
<Make>Freightliner</Make>
<Model>FLD12064T</Model>
<Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
</Vehicle>
</Vehicles>
<Errors/>
<InvalidVINMsg/>
</VINResult>
EOT
vehicle_make_and_models = doc.search('Vehicle').map{ |vehicle|
[
'make', vehicle.at('Make').content,
'model', vehicle.at('Model').content
]
}
这导致:
vehicle_make_and_models # => [["make", "Freightliner", "model", "FLD12064T"], ["make", "Freightliner", "model", "FLD12064T"]]
如果您不想要字段名称:
vehicle_make_and_models = doc.search('Vehicle').map{ |vehicle|
[
vehicle.at('Make').content,
vehicle.at('Model').content
]
}
vehicle_make_and_models # => [["Freightliner", "FLD12064T"], ["Freightliner", "FLD12064T"]]
注意:您拥有XML,而不是HTML。不要以为Nokogiri对待它们是相同的,或者差别是微不足道的。 Nokogiri严格解析XML,因为XML是一个严格的标准。
我使用CSS选择器,除非我绝对必须使用XPath。 CSS在大多数情况下会产生更清晰的选择器,从而使代码更容易阅读。
vinxml.xpath('//VINResult//Vehicles//Vehicle//Make').text
不起作用,因为//
表示“从文档的顶部开始”。每次遇到Nokogiri从顶部开始,向下搜索,并找到所有匹配的节点。 xpath
将所有匹配的节点作为NodeSet返回,而不仅仅是特定节点,text
将返回NodeSet中所有节点的文本,从而产生文本的连接字符串,这可能不是什么你想要的。
我更喜欢使用search
代替xpath
或css
。它像其他两个一样返回一个NodeSet,但它也允许我们使用CSS或XPath选择器。如果您的特定选择器不明确并且可以解释为CSS或XPath,那么您可以使用显式表单。同样,您可以使用at
或xpath_at
或css_at
来查找第一个匹配的节点,该节点相当于search('foo').first
。
答案 1 :(得分:0)
您还可以执行以下操作,将Array
中的所有车辆和所有车辆属性放入Hash
require 'nokogiri'
doc = Nokogiri::XML(open(YOUR_XML_FILE))
vehicles = doc.search("Vehicle").map do |vehicle|
Hash[
vehicle.children.map do |child|
[child.name, child.text] unless child.text.chomp.strip == ""
end.compact
]
end
#=>[{"ID"=>"131497", "Product"=>"TRUCK", "Year"=>"1993", "Make"=>"Freightliner", "Model"=>"FLD12064T", "Description"=>"120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes Power Steering 6x4 (SBA - Set Back Axle)"}, {"ID"=>"131497", "Product"=>"TRUCK", "Year"=>"1993", "Make"=>"Freightliner", "Model"=>"FLD12064T", "Description"=>"120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes Power Steering 6x4 (SBA - Set Back Axle)"}]
然后您可以访问单个车辆的所有属性,即
vehicles.first["ID"]
#=> "131497"
vehicles.first["Year"]
#=> "1993"
等