如何使用下面的句子中的正则表达式编写R代码以提取与金钱或百分比相关的所有数字。 R代码应该选择以下内容:39.7 percent
和美元值,例如$873,599
和$1 million
。
我的示例文字是:
虽然所有价格区间的选择都很低,但对高端住宅的兴趣仍然很高,355个房产,占所有房屋销售的37.6%,吸引了超过873,599美元和100万美元的价格。
我尝试了以下$?[0-9,.]+Percent?Million?
,但这没有按预期工作。
答案 0 :(得分:3)
[0-9]+(?:\.[0-9]+)?\s*(?:%|percent)|\$(?:[0-9]{3},)*[0-9]+(?:\s(?:thousand|million|billion|trillion))?
此正则表达式将执行以下操作:
%
符号或文字现场演示
https://regex101.com/r/uG6mQ4/1
示例文字
"虽然100%价格区间的选择较低,但对高端住宅的兴趣仍然很高,355个房产,占所有房屋销售的37.6%,吸引了超过873,599美元和100万美元的价格。
样本匹配
[0][0] = 100%
[1][0] = 37.6 percent
[2][0] = $873,599
[3][0] = $1 million
NODE EXPLANATION
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
% '%'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
percent 'percent'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\$ '$'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
thousand 'thousand'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
million 'million'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
billion 'billion'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
trillion 'trillion'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------