nokogiri中的Html数据解析问题

时间:2015-06-04 07:42:22

标签: ruby-on-rails ruby nokogiri

我有一个html文件有普通的HTML

我正在使用 ruby​​ 1.8.7 我需要拿PO号码和追踪号码 。其中一些跟踪没有丢失,我需要在这种情况下加上'nil'。

但仍然无法正确获得解决方案。

<html>
  <head>
  </head>
  <body>
    <div>***NOTE*** <br> ITems<br><br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br> TROXLER RD<br><br>India<br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br><br><br>
    <br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br>TROXLER RD<br><br>India<br><br>Shipped Via : UPS    Track It : <a href= ab.com> 1Z2559690357791340</a><br><font face="COURIER" size="2" color="black"><br>
    <br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br>TROXLER RD<br><br>India<br>
    <br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br> TROXLER RD<br><br>India<br><br>Shipped Via : UPS    Track It : <a href= ab.com> 1Z2559690357791340</a><br><font face="COURIER" size="2" color="black"><br>
  </body>
</html>

我有像

这样的代码
require 'rubygems'
require 'nokogiri'   
require 'open-uri'

PAGE_URL = "a.html"

page = Nokogiri::HTML(open(PAGE_URL))
    data = page.css("body").text

    po_numbers = data.scan(/Invoice Number : \[\d+\] PO Number : \[(\d+)\]/).flatten
    tracking_numbers = page.css("a").text.split

    [["PO Number", "Tracking Number"]].concat(po_numbers.zip(tracking_numbers))
 puts po_numbers
 puts tracking_numbers


=> po_numbers = ["7894562", "7894562", "7894562","7894562","7894562"]
=> tracking_numbers = ["1Z2559690357791340", "1Z2559690357791340"]

=> po_numbers.zip(tracking_numbers)
=> [["7894562", "1Z2559691257791340"], ["7894562", "1Z2559690357791340"], ["7894562", "1Z2559690357791340"],["7894562","nil"],["7894562,nil "]]

What i want is 
=> [["7894562", "1Z2559691257791340"], ["7894562", "nil"], ["7894562", "1Z2559690357791340"],["7894562","nil"],["7894562,1Z2559690357791340 "]]

1 个答案:

答案 0 :(得分:0)

我建议您使用Hash保存po_numberstracking_numbers 因此,您可以将po_numberstracking_numbers

相关联