Question

什么是匹配字符串中任何有效Python整数文字的正则表达式？它应该支持所有额外的东西，如o和l，但不匹配浮点数或带有数字的变量。我正在使用Python的re，因此支持的任何语法都可以。

编辑：这是我的动机（显然这很重要）。我正在尝试修复http://code.google.com/p/sympy/issues/detail?id=3182。我想要做的是为IPython创建一个钩子，自动将int / int（如1/2）转换为Rational(int, int)，（如Rational(1, 2)。原因是否则无法制作1/2被注册为有理数，因为它是Python类型__div__ Python类型。在SymPy中，这可能非常烦人，因为x**(1/2)之类的东西会创建x**0（或{ {1}}使用x**0.5除法或Python 3），当你想要的是__future__时，确切的数量。

我的解决方案是向IPython添加一个钩子，它使用Integer自动包装输入中的所有整数文字（SymPy的自定义整数类，在分割时给出x**Rational(1, 2)）。这将让我为Rational添加一个选项，让SymPy在这方面更像传统的计算机代数系统，对于那些想要它的人来说。我希望这能解释为什么我需要它来匹配任意Python表达式中的任何和所有文字，这就是为什么它不需要将float文字和变量与名称中的数字匹配。

另外，既然每个人都对我的尝试如此感兴趣，那么它就是：在我放弃之前不多（正则表达式很难）。我玩isympy使它不能捕捉浮点文字的第一部分，但这似乎不起作用（如果有人能告诉我原因，我会很好奇，一个例子是(?!\.)）

编辑2 ：由于我打算将此项与re.sub(r"(\d*(?!\.))", r"S\(\1\)", "12.1")结合使用，您可以在答案中用括号括起整个内容，以便我可以使用re.sub ：）

Answer 1

语法在http://docs.python.org/reference/lexical_analysis.html#integers中描述。这是将其表达为正则表达式的一种方式：

(0|[1-9][0-9]*|0[oO]?[0-7]+|0[xX][0-9a-fA-F]+|0[bB][01]+)[lL]?

免责声明：这不支持负整数，因为在Python中，-之类的-31实际上不是整数文字的一部分，而是一个单独的运算符。

Answer 2

definition of the integer literal是（在3.x中，在2.x中略有不同）：

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

所以，像这样：

[1-9]\d*|0|0[oO][0-7]+|0[xX][\da-fA-F]+|0[bB][01]+

基于说你想支持“l”，我想你实际上想要the 2.x definition：

longinteger    ::=  integer ("l" | "L")
integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
nonzerodigit   ::=  "1"..."9"
octdigit       ::=  "0"..."7"
bindigit       ::=  "0" | "1"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"

可以写

(?:[1-9]\d+|0|0[oO]?[0-7]+|0[xX][\da-fA-F]+|0[bB][01]+)[lL]?

Answer 3

我不相信使用re是要走的路。 Python有tokenize，ast，symbol和parser个模块，可用于解析/处理/操作/重写Python代码......

>>> s = "33.2 + 6 * 0xFF - 0744"
>>> from StringIO import StringIO
>>> import tokenize
>>> t = list(tokenize.generate_tokens(StringIO(s).readline))
>>> t
[(2, '33.2', (1, 0), (1, 4), '33.2 + 6 * 0xFF - 0744'), (51, '+', (1, 5), (1, 6), '33.2 + 6 * 0xFF - 0744'), (2, '6', (1, 7), (1, 8), '33.2 + 6 * 0xFF - 0744'), (51, '*', (1, 9), (1, 10), '33.2 + 6 * 0xFF - 0744'), (2, '0xFF', (1, 11), (1, 15), '33.2 + 6 * 0xFF - 0744'), (51, '-', (1, 16), (1, 17), '33.2 + 6 * 0xFF - 0744'), (2, '0744', (1, 18), (1, 22), '33.2 + 6 * 0xFF - 0744'), (0, '', (2, 0), (2, 0), '')]
>>> nums = [eval(i[1]) for i in t if i[0] == tokenize.NUMBER]
>>> nums
[33.2, 6, 255, 484]
>>> print map(type, nums)
[<type 'float'>, <type 'int'>, <type 'int'>, <type 'int'>]

http://docs.python.org/library/tokenize.html有一个例子，它将浮动重写为decimal.Decimal

Answer 4

如果你真的想要匹配两个“方言”，你会得到一些含糊之处，例如octals（Python 3中需要o）。但以下情况应该有效：

r = r"""(?xi) # Verbose, case-insensitive regex
(?<!\.)       # Assert no dot before the number
\b            # Start of number
(?:           # Match one of the following:
 0x[0-9a-f]+| # Hexadecimal number
 0o?[0-7]+|   # Octal number
 0b[01]+|     # Binary number
 0+|          # Zero
 [1-9]\d*     # Other decimal number
)             # End of alternation
L?            # Optional Long integer
\b            # End of number
(?!\.)        # Assert no dot after the number"""

Answer 5

这样的事情就足够了吗？

r = r"""
(?<![\w.])               #Start of string or non-alpha non-decimal point
    0[X][0-9A-F]+L?|     #Hexadecimal
    0[O][0-7]+L?|        #Octal
    0[B][01]+L?|         #Binary
    [1-9]\d*L?           #Decimal/Long Decimal, will not match 0____
(?![\w.])                #End of string or non-alpha non-decimal point
"""

（带有标志re.VERBOSE | re.IGNORECASE）

Answer 6

这非常接近：

re.match('^(0[x|o|b])?\d+[L|l]?$', '0o123l')

正则表达式匹配Python整数文字

6 个答案: