描述

Question

如何使用下面的句子中的正则表达式编写R代码以提取与金钱或百分比相关的所有数字。 R代码应该选择以下内容：39.7 percent和美元值，例如$873,599和$1 million。

我的示例文字是：

虽然所有价格区间的选择都很低，但对高端住宅的兴趣仍然很高，355个房产，占所有房屋销售的37.6％，吸引了超过873,599美元和100万美元的价格。

我尝试了以下$?[0-9,.]+Percent?Million?，但这没有按预期工作。

Answer 1

描述

[0-9]+(?:\.[0-9]+)?\s*(?:%|percent)|\$(?:[0-9]{3},)*[0-9]+(?:\s(?:thousand|million|billion|trillion))?

Regular expression visualization

此正则表达式将执行以下操作：

找到代表百分比的所有数字，包含或不包含小数点
- 该数字后面可能跟一个%符号或文字
找到所有美元金额的数字
- 带有领先的美元符号
- 可能包含逗号分隔符
- 后面可能跟一个千万，百万，十亿或万亿
避免其他非美元或百分比数字

实施例

现场演示

https://regex101.com/r/uG6mQ4/1

示例文字

＆＃34;虽然100％价格区间的选择较低，但对高端住宅的兴趣仍然很高，355个房产，占所有房屋销售的37.6％，吸引了超过873,599美元和100万美元的价格。

样本匹配

[0][0] = 100%
[1][0] = 37.6 percent
[2][0] = $873,599
[3][0] = $1 million

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    %                        '%'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    percent                  'percent'
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  \$                       '$'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    [0-9]{3}                 any character of: '0' to '9' (3 times)
----------------------------------------------------------------------
    ,                        ','
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      thousand                 'thousand'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      million                  'million'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      billion                  'billion'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      trillion                 'trillion'
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------

使用R

1 个答案:

描述

实施例

解释