Question

我需要解析不同文件的总金额。每个文件的布局都不同，因此我需要解析的行会有所不同。

从＃34; Total＆＃34;＆＃34;

之后的数字中捕获的数字应该是什么样的正则表达式？

它需要不区分大小写，并且应该在＆＃34; Total＆＃34;之后考虑最接近的匹配。在＆＃34; Total＆＃34;之前或之后可以有任何内容，我需要在它之后的第一个数字。

例如：

from string "Service charges: 10 Total: 100 Shipping: 10"
from string "Service charges: 10 Total Amount: 100 Shipping: 10"
from string "Service charges: 10 Grand Total: 100 Shipping: 10"
from string "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10"

在上述所有情况下，输出应为100。

Answer 1

如果您真正询问的是各种字符串的模式匹配，请查看使用scan并获取数字字符串：

[
  "Service charges: 10 Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount: 100 Shipping: 10",
  "Service charges: 10 Grand Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s.scan(/\d+/)[1] }
=> ["100", "100", "100", "100"]

这假设你想要每个字符串中的第二个数字。

如果该订单将会发生变化，这种情况不太可能，因为看起来您正在扫描发票，那么模式和/或scan的变体就会起作用。这会将其切换并使用基于“总计”位置的标准正则表达式搜索，一些可能的插入文本，然后是“：”和总值：

[
  "Service charges: 10 Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount: 100 Shipping: 10",
  "Service charges: 10 Grand Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s[/Total.*?: (\d+)/, 1] }
=> ["100", "100", "100", "100"]

要获取整数值，请在to_i语句中附加map：

[
  "Service charges: 10 Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount: 100 Shipping: 10",
  "Service charges: 10 Grand Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s[/Total.*?: (\d+)/, 1].to_i }
=> [100, 100, 100, 100]

对于您的示例字符串，最好使用区分大小写的模式来匹配“Total”，除非您知道在小写字母中会遇到“total”。而且，在这种情况下，你应该展示这样一个例子。

Answer 2

我认为你可以这样做：

/Total[^:]*:\s+([0-9]+)/i

<强>解释

Total搜索“total”
[^:]*后跟任何内容或任何内容，直到找到冒号“：”
:\s+读取冒号和任何后面的空格（可能是*而不是+）
([0-9]+)将这些数字读入一组以供日后检索 - ＆gt; 100

我不确定如何在您使用的环境中表明不区分大小写，但通常可以使用我在i

中指示的一些标记来完成

这是一个fiddle as an example

Answer 3

# assuming you have all your files ready in an array
a = ["Service charges: 10 Total: 100 Shipping: 10",  "Service charges: 10 Total Amount: 100 Shipping: 10", "Service charges: 10 Grand Total: 100 Shipping: 10", "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10"]
# we find every total with the following regexp
a.map {|s| s[/total[^\d]*(?<total>\d+)/i, 'total']}
#=> ["100", "100", "100", "100"]

正则表达式为/total[^\d]*(?<total>\d*)/i。它寻找单词＆＃34; total＆＃34;并忽略任何后续字符，直到找到一个数字（它在捕获组中返回）。 i选项使其不区分大小写。

从字符串中捕获总量的正则表达式是什么？

3 个答案: