Question

我有以下文字：

s1 = 'Promo Tier 77 (4.89 USD)'
s2 = 'Promo (11.50 USD) Tier 1 Titles Only'

由此我想拉出括号中未包含的数字。它将是：

s1 --> '77'
s2 --> '1'

我目前正在使用弱正则表达式re.findall('\s\d+\s',s1)。什么是正确的正则表达式？像re.findall('\d+',s1)这样的东西，但不包括括号内的任何内容。

>>> re.findall('\d+',s1)
['77', '4', '89'] # two of these numbers are within the parenthetical. 
                  # I only want '77'

Answer 1

我觉得有用的一种方法是在上下文中使用交替运算符，将左侧想要排除的内容放在一边，（说 扔掉它，它是垃圾 ）并在右侧的捕获组中放置您想要匹配的内容。

然后，您可以将其与filter结合使用，或使用列表推导来删除正则表达式引擎从交替运算符左侧的表达式中选取的空列表项。

>>> import re
>>> s = """Promo (11.50 USD) Tier 1 Titles Only
Promo (11.50 USD) (10.50 USD, 11.50 USD) Tier 5
Promo Tier 77 (4.89 USD)"""
>>> filter(None, re.findall(r'\([^)]*\)|(\d+)', s))
['1', '5', '77']

Answer 2

您可以创建一个删除了括号部分的临时字符串，然后运行您的代码。我使用了一个空格，以便丢失字符串部分之前和之后的数字无法连接。

>>> import re
>>> s = 'Promo Tier 77 (11.50 USD) Tier 1 Titles Only'
>>> temp = re.sub(r'\(.*?\)', ' ', s)
Promo Tier 77   Tier 1 Titles Only
>>> re.findall('\d+', temp)
['77', '1']

你当然可以把它简化为一行。

Answer 3

对你的琴弦做一些分裂。例如伪代码

s1 = "Promo Tier 77 (4.89 USD)"
s  = s1.split(")")
for ss in s :
  if "(" in ss: # check for the open brace
     if the number in ss.split("(")[0]:  # split at the open brace and do your regex
        print the number

Answer 4

(\b\d+\b)(?=(?:[^()]*\([^)]*\))*[^()]*$)

试试这个。抓住捕获。参见演示。

http://regex101.com/r/gT6kI4/7

在括号表达式之外解析字符串

4 个答案: