使用R

时间:2016-05-15 12:18:07

标签: regex r

如何使用下面的句子中的正则表达式编写R代码以提取与金钱或百分比相关的所有数字。 R代码应该选择以下内容:39.7 percent和美元值,例如$873,599$1 million

我的示例文字是:

  

虽然所有价格区间的选择都很低,但对高端住宅的兴趣仍然很高,355个房产,占所有房屋销售的37.6%,吸引了超过873,599美元和100万美元的价格。

我尝试了以下$?[0-9,.]+Percent?Million?,但这没有按预期工作。

1 个答案:

答案 0 :(得分:3)

描述

[0-9]+(?:\.[0-9]+)?\s*(?:%|percent)|\$(?:[0-9]{3},)*[0-9]+(?:\s(?:thousand|million|billion|trillion))?

Regular expression visualization

此正则表达式将执行以下操作:

  • 找到代表百分比的所有数字,包含或不包含小数点
    • 该数字后面可能跟一个%符号或文字
  • 找到所有美元金额的数字
    • 带有领先的美元符号
    • 可能包含逗号分隔符
    • 后面可能跟一个千万,百万,十亿或万亿
    • 这样的词
  • 避免其他非美元或百分比数字

实施例

现场演示

https://regex101.com/r/uG6mQ4/1

示例文字

  

"虽然100%价格区间的选择较低,但对高端住宅的兴趣仍然很高,355个房产,占所有房屋销售的37.6%,吸引了超过873,599美元和100万美元的价格。

样本匹配

[0][0] = 100%
[1][0] = 37.6 percent
[2][0] = $873,599
[3][0] = $1 million

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    %                        '%'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    percent                  'percent'
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  \$                       '$'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    [0-9]{3}                 any character of: '0' to '9' (3 times)
----------------------------------------------------------------------
    ,                        ','
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      thousand                 'thousand'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      million                  'million'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      billion                  'billion'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      trillion                 'trillion'
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------