Question

我有一个包含大量采购订单编号的html文档。

我正在尝试检索。我对正则表达不太满意。我能够找到第一个没有位置。但我需要他们所有人。怎么做。

我的Html文件看起来像

<html>
 <head></head>
<body>
<br>
Invoice Number : [12346456] PO Number : [6464645]

<hr>
Invoice Number : [90156460] PO Number : [6416462]

<hr>
Invoice Number : [90868741] PO Number : [1613464]

</body>
</html>

我的代码

po_count = page.css('body').text.scan(/\d+/)[1].to_i

我先得到PO Number 6464645 我需要所有的PO编号

Answer 1

po_count = page.css('body').text.scan(/\d+/)

将生成匹配数组

po_count = ["12346456", "6464645", "90156460", "6416462", "90868741", "1613464"]

po_count = po_count.map{|e| e.to_i}

将会

po_count = [12346456, 6464645, 90156460, 6416462, 90868741, 1613464]

Answer 2

不是非常精益但应该有效：

po_numbers = []
page.css('body').text.scan(/Invoice Number : \[\d+\] PO Number : \[(\d+)\]/) do
  po_numbers << $1.to_i
end

正则表达式读取n个计数

2 个答案: