我有这样的字符串:
- 您的签约奖金为123,000
- 今年签约奖金不好。今年的签约奖金为123,000欧元
- 奖金为14,456,但有签约奖金。
我想要类似的输出
a)如果后跟signing bonus
的数字为数字,则保留该部分字符串并删除所有内容。 请参见预期输出1和2
b)如果signing bonus
后没有数字,我应该得到字符串的第一部分。 查看预期的输出3
预期产量
是123,000
今年的费用为123,000欧元
奖金是14,456,但是
我的正则表达式:
match1 = re.findall(r'(?<=\bSigning Bonus\b)\s*(?:\S+\b\s*){0,8}',value, re.I|re.M|re.DOTALL)
它可以处理输出1和输出2,但不能处理输出3。
我也欢迎无需正则表达式也可以解决的解决方案!
答案 0 :(得分:4)
尝试下面的代码。
s1 = "Your signing bonus is 123,000"
s2 = "This year signing bonus is bad. the signing bonus for this year is EUR 123,000"
s3 = "The bonus is 14,456, but signing bonus."
regex = '[0-9]'
import re
def format_string(s):
for subs in s.split('signing bonus'):
if re.findall(regex, subs):
print subs.strip()
format_string(s1)
format_string(s2)
format_string(s3)
输出为:
is 123,000
for this year is EUR 123,000
The bonus is 14,456, but
答案 1 :(得分:2)
如果可以使用re.sub
,则可以使用此正则表达式将匹配的文本替换为空字符串,
^[^\d\n]*signing bonus\s*|\s*signing bonus[^\d\n]*$
在前两种情况下,您打算捕获signing bonus
之后的字符串,但在第三种情况下,您希望的字符串在signing bonus
之前,因此,您需要使用交替的另一个正则表达式。
Python代码,
import re
arr = ['Your signing bonus is 123,000','This year signing bonus is bad. the signing bonus for this year is EUR 123,000','The bonus is 14,456, but signing bonus.']
for s in arr:
print(s, '-->', re.sub(r'^[^\d\n]*signing bonus\s*|\s*signing bonus[^\d\n]*$', '', s))
打印
our signing bonus is 123,000 --> is 123,000
This year signing bonus is bad. the signing bonus for this year is EUR 123,000 --> for this year is EUR 123,000
The bonus is 14,456, but signing bonus. --> The bonus is 14,456, but
答案 2 :(得分:0)
这将打印您的答案:
statements = [
'Your signing bonus is 123,000',
'This year signing bonus is bad. the signing bonus for this year is EUR 123,000',
'The bonus is 14,456, but signing bonus.',
]
for statement in statements:
ans = statement.split('signing bonus')
if not ans:
print('')
continue
for i in range(len(ans) - 1, -1, -1):
for word in ans[i].split(' '):
try:
number = int(word.replace(',', ''))
print(ans[i].strip())
break
except:
pass
输出:
is 123,000
for this year is EUR 123,000
The bonus is 14,456, but