Question

看看以下两个例子：

https://stackoverflow.com

(https|http):\/\/.*

第一个是普通URL，第二个是REGEX字符串。我如何区分哪个是正则表达式字符串，哪个不在python3中？

Answer 1

这两个字符串都是潜在有效的正则表达式字符串 - 两者都可以在python中使用。你唯一能做的就是找到明确不有效正则表达式的字符串：

re.compile('https://stackoverflow.com')
# re.compile('https://stackoverflow.com')

re.compile('(https|http):\/\/.*')
# re.compile('(https|http):\\/\\/.*')

re.compile('(?:http|?:https)')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/re.py", line 224, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python3.5/re.py", line 293, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
    p = _parse_sub(source, pattern, 0)
  File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.5/sre_parse.py", line 778, in _parse
    p = _parse_sub(source, state)
  File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.5/sre_parse.py", line 638, in _parse
    source.tell() - here + len(this))
sre_constants.error: nothing to repeat at position 8

Answer 2

两者都是字符串，两者都会编译成正则表达式 - 这取决于你如何使用它们。

以下将进行字符串比较：

>>> x = "https://stackoverflow.com"
>>> if x == "https://stackoverflow.com":
...   print("true")
...
true

以下将进行正则表达式比较：

>>> import re
>>> x = "https://stackoverflow.com"                                           
>>> if re.match("(https|http):\/\/.*", x):
...   print("true")
...
true

Answer 3

您使用错误的示例来表示合法的问题。正则表达式和非正则表达式可以具有共同的字母表，但它们在规则上不同。如果一个字符串遵循正则表达式规则，那么也一个正则表达式字符串。

超越@match回答你可以测试它是否是带有try / except子句的有效正则表达式：

try:
    re.compile("MYNOTYETREGEXSTRING")
    print("It's regexp!")
except:
    print("It doesn't belong to the regexp grammar")

如何在python3中区分正则表达式和非正则表达式字符串

3 个答案: