我正在尝试解析字符串中的所有钱。例如,我要提取:
['$250,000', '$3.90', '$250,000', '$500,000']
来自:
'Up to $250,000………………………………… $3.90 Over $250,000 to $500,000'
正则表达式:
\$\ ?(\d+\,)*\d+(\.\d*)?
似乎与this link中的所有货币表达式匹配。但是,当我尝试在Ruby上使用scan
时,它无法给我想要的结果。
s # => "Up to $250,000 $3.90 Over $250,000 to $500,000, add$3.70 Over $500,000 to $1,000,000, add..$3.40 Over $1,000,000 to $2,000,000, add...........$2.25\nOver $2,000,000 add ..$2.00"
r # => /\$\ ?(\d+\,)*\d+\.?\d*/
s.scan(r)
# => [["250,"], [nil], ["250,"], ["500,"], [nil], ["500,"], ["000,"], [nil], ["000,"], ["000,"], [nil], ["000,"], [nil]]
从String#scan
文档来看,这似乎是由于该组。我如何解析字符串中的所有钱?
答案 0 :(得分:2)
让我们看看您的正则表达式,我将以 free-spacing模式编写该正则表达式,以便对其进行记录:
r = /
\$ # match a dollar sign
\ ? # optionally match a space (has no effect)
( # begin capture group 1
\d+ # match one or more digits
, # match a comma (need not be escaped)
)* # end capture group 1 and execute it >= 0 times
\d+ # match one or more digits
\.? # optionally match a period
\d* # match zero or more digits
/x # free-spacing regex definition mode
在非自由间隔模式下,将编写如下。
r = /\$ ?(\d+,)*\d+\.?\d*/
当在自由空间模式下定义正则表达式时,在评估正则表达式之前会删除所有空格,这就是为什么我必须转义空格的原因。如果未在自由空间模式下定义正则表达式,则没有必要。
在美元符号后不需要空格来匹配空格,因此应删除\ ?
。假设现在有
r = /\$\d+\.?\d*/
"$2.31 cat $44. dog $33.607".scan r
#=> ["$2.31", "$44.", "$33.607"]
可以,但是是否要匹配小数点后两位没有精确数字的值是个问题。
现在写
r = /\$(\d+,)*\d+\.?\d*/
"$2.31 cat $44. dog $33.607".scan r
#=> [[nil], [nil], [nil]]
要了解为什么获得此结果,请检查String#scan的文档,尤其是第一段的最后一句话:“如果模式包含组,则每个单独的结果本身就是一个数组,每个组包含一个条目。” 。
我们可以通过将捕获组更改为非捕获组来避免该问题:
r = /\$(?:\d+,)*\d+\.?\d*/
"$2.31 cat $44. dog $33.607".scan r
#=> ["$2.31", "$44.", "$33.607"]
现在考虑一下:
"$2,241.31 cat $1,2345. dog $33.607".scan r
#=> ["$2,241.31", "$1,2345.", "$33.607"]
这仍然不太正确。请尝试以下操作。
r = /
\$ # match a dollar sign
\d{1,3} # match one to three digits
(?:,\d{3}) # match ',' then 3 digits in a nc group
* # execute the above nc group >=0 times
(?:\.\d{2}) # match '.' then 2 digits in a nc group
? # optionally match the above nc group
(?![\d,.]) # no following digit, ',' or '.'
/x # free-spacing regex definition mode
"$2,241.31 $2 $1,234 $3,6152 $33.607 $146.27".scan r
#=> ["$2,241.31", "$2", "$1,234", "$146.27"]
(?![\d,.])
是负前瞻。
在正常模式下,此正则表达式编写如下。
r = /\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?(?![\d,.])/
如果在正则表达式末尾没有负前瞻,则会获得以下错误结果。
r = /\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?/
"$2,241.31 $2 $1,234 $3,6152 $33.607 $146.27".scan r
#=> ["$2,241.31", "$2", "$1,234", "$3,615", "$33.60",
# "$146.27"]
答案 1 :(得分:1)
[3] pry(main)> str = <<EOF
[3] pry(main)* Up to $250,000………………………………… $3.90 Over $250,000 to $500,000, add………………$3.70 Over $500,000 to $1,000,000, add……………..$3.40 Over $1,000,000 to $2,000,000, add……...........$2.25
[3] pry(main)* Over $2,000,000 add …..………………………$2.00
[3] pry(main)* EOF
=> "Up to $250,000………………………………… $3.90 Over $250,000 to $500,000, add………………$3.70 Over $500,000 to $1,000,000, add……………..$3.40 Over $1,000,000 to $2,000,000, add……...........$2.25\nOver $2,000,000 add …..………………………$2.00\n"
[4] pry(main)> str.scan /\$\d+(?:[,.]\d+)*/
=> ["$250,000", "$3.90", "$250,000", "$500,000", "$3.70", "$500,000", "$1,000,000", "$3.40", "$1,000,000", "$2,000,000", "$2.25", "$2,000,000", "$2.00"]
[5] pry(main)>