我有一个html文件有普通的HTML
我正在使用 ruby 1.8.7 我需要拿PO号码和追踪号码 。其中一些跟踪没有丢失,我需要在这种情况下加上'nil'。
但仍然无法正确获得解决方案。
<html>
<head>
</head>
<body>
<div>***NOTE*** <br> ITems<br><br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br> TROXLER RD<br><br>India<br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br><br><br>
<br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br>TROXLER RD<br><br>India<br><br>Shipped Via : UPS Track It : <a href= ab.com> 1Z2559690357791340</a><br><font face="COURIER" size="2" color="black"><br>
<br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br>TROXLER RD<br><br>India<br>
<br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br> TROXLER RD<br><br>India<br><br>Shipped Via : UPS Track It : <a href= ab.com> 1Z2559690357791340</a><br><font face="COURIER" size="2" color="black"><br>
</body>
</html>
我有像
这样的代码require 'rubygems'
require 'nokogiri'
require 'open-uri'
PAGE_URL = "a.html"
page = Nokogiri::HTML(open(PAGE_URL))
data = page.css("body").text
po_numbers = data.scan(/Invoice Number : \[\d+\] PO Number : \[(\d+)\]/).flatten
tracking_numbers = page.css("a").text.split
[["PO Number", "Tracking Number"]].concat(po_numbers.zip(tracking_numbers))
puts po_numbers
puts tracking_numbers
=> po_numbers = ["7894562", "7894562", "7894562","7894562","7894562"]
=> tracking_numbers = ["1Z2559690357791340", "1Z2559690357791340"]
=> po_numbers.zip(tracking_numbers)
=> [["7894562", "1Z2559691257791340"], ["7894562", "1Z2559690357791340"], ["7894562", "1Z2559690357791340"],["7894562","nil"],["7894562,nil "]]
What i want is
=> [["7894562", "1Z2559691257791340"], ["7894562", "nil"], ["7894562", "1Z2559690357791340"],["7894562","nil"],["7894562,1Z2559690357791340 "]]
答案 0 :(得分:0)
我建议您使用Hash
保存po_numbers
和tracking_numbers
因此,您可以将po_numbers
与tracking_numbers