Regex to match less than 500 feet with decimal places in a string

时间:2019-01-18 19:13:48

标签: python regex

I am trying to use regular expressions to determine if a string contains a value of less than 500 feet. Importantly, there are some key constraints and assumptions to the matching:

  • Can assume commas have been stripped. Decimals are guaranteed to be . and not ,
  • Cannot assume the numeric value is preceded by a space.
  • Can assume "feet" will be written as "ft" or "feet"
  • Can assume lowercase
  • Decimals can be any length
  • There may be any number of spaces between the number and word "feet" or "ft"

My attempts thus far:

Attempt 1

\b[1-4]{0,1}[0-9]{1,2}(\.[0-9]{1,}}){0,1} {0,}(ft|feet)\b

This was good, but failed to account for decimals and matches values like 1000.5 ft (matching "5 ft")

Attempt 2

My next attempt was to include a negative lookbehind to make sure the match wasn't preceded by a . or any number.

(?<!(\.|[0-9]))([1-4]{0,1}[0-9]{1,2}(\.[0-9]{1,}}){0,1} {0,}(ft|feet))\b

Unfortunately, this doesn't match any decimals now (e.g., 5.5 ft should match but doesn't). I suspect I'm misunderstanding how negative lookbehind works.

I would appreciate any help understanding where I'm going wrong!

Test Cases:

  • "1 ft tall" - match
  • "1ft tall" - match
  • "1.1ft tall" - match
  • "He is 6 feet tall" - match
  • "499.555 feet tall" - match
  • "He is 2 m tall" - no match
  • "500 feet tall" - no match
  • "The building is 1000.405 ft tall" - no match

1 个答案:

答案 0 :(得分:1)

您可以使用

r"(?<!\d\.)(?<!\d)(?:[1-9]|[1-9]\d|[1-4]\d\d)(?:\.\d+)?\s*f(?:ee)?t\b"

请参见regex demo

详细信息

  • (?<!\d\.)(?<!\d)-两个否定的隐含含义,以确保当前位置之前没有数字+点或仅一个数字
  • (?:[1-9]|[1-9]\d|[1-4]\d\d)-一个数字
    • [1-9]-从19
    • [1-9]\d-从1099
    • [1-4]\d\d-100499
  • (?:\.\d+)?-一个可选的非捕获组,匹配点的可选顺序,然后是1个以上的数字
  • \s*-超过0个空格
  • f(?:ee)?t-ftfeet(但不是fet
  • \b-单词边界。