在短语后面捕获数字

时间:2019-04-25 11:05:14

标签: python regex string

我有这样的字符串:

  
      
  1. 您的签约奖金为123,000
  2.   
  3. 今年签约奖金不好。今年的签约奖金为123,000欧元
  4.   
  5. 奖金为14,456,但有签约奖金。
  6.   

我想要类似的输出

a)如果后跟signing bonus的数字为数字,则保留该部分字符串并删除所有内容。 请参见预期输出1和2

b)如果signing bonus后没有数字,我应该得到字符串的第一部分。 查看预期的输出3

预期产量

  
      
  1. 是123,000

  2.   
  3. 今年的费用为123,000欧元

  4.   
  5. 奖金是14,456,但是

  6.   

我的正则表达式:

match1 = re.findall(r'(?<=\bSigning Bonus\b)\s*(?:\S+\b\s*){0,8}',value, re.I|re.M|re.DOTALL)

它可以处理输出1和输出2,但不能处理输出3。

我也欢迎无需正则表达式也可以解决的解决方案!

3 个答案:

答案 0 :(得分:4)

尝试下面的代码。

s1 = "Your signing bonus is 123,000"
s2 = "This year signing bonus is bad. the signing bonus for this year is EUR 123,000"
s3 = "The bonus is 14,456, but signing bonus."
regex = '[0-9]'
import re
def format_string(s):
    for subs in s.split('signing bonus'):
        if re.findall(regex, subs):
            print subs.strip()

format_string(s1)
format_string(s2)
format_string(s3)

输出为:

is 123,000
for this year is EUR 123,000
The bonus is 14,456, but

答案 1 :(得分:2)

如果可以使用re.sub,则可以使用此正则表达式将匹配的文本替换为空字符串,

^[^\d\n]*signing bonus\s*|\s*signing bonus[^\d\n]*$

在前两种情况下,您打算捕获signing bonus之后的字符串,但在第三种情况下,您希望的字符串在signing bonus之前,因此,您需要使用交替的另一个正则表达式。

Regex Demo

Python代码,

import re

arr = ['Your signing bonus is 123,000','This year signing bonus is bad. the signing bonus for this year is EUR 123,000','The bonus is 14,456, but signing bonus.']

for s in arr:
 print(s, '-->', re.sub(r'^[^\d\n]*signing bonus\s*|\s*signing bonus[^\d\n]*$', '', s))

打印

our signing bonus is 123,000 --> is 123,000
This year signing bonus is bad. the signing bonus for this year is EUR 123,000 --> for this year is EUR 123,000
The bonus is 14,456, but signing bonus. --> The bonus is 14,456, but

答案 2 :(得分:0)

这将打印您的答案:

statements = [
    'Your signing bonus is 123,000',
    'This year signing bonus is bad. the signing bonus for this year is EUR 123,000',
    'The bonus is 14,456, but signing bonus.',
]
for statement in statements:
    ans = statement.split('signing bonus')
    if not ans:
        print('')
        continue
    for i in range(len(ans) - 1, -1, -1):
        for word in ans[i].split(' '):
            try:
                number = int(word.replace(',', ''))
                print(ans[i].strip())
                break
            except:
                pass

输出:

is 123,000
for this year is EUR 123,000
The bonus is 14,456, but