我有一个如下字符串
strng ="Fiscal Year Ended March 31, 2018 Total Year (in $000's)"
如果上面的字符串包含年份子字符串(例如2014、2015等),则将“年份”子字符串与其余部分分开。
获取我正在使用的“年”
re.findall(r"\b20[012]\d\b",strng)
我怎么也可以得到其余的子串。 预期输出为
year_substring --> '2018'
rest --> 'Fiscal Year Ended March 31, Total Year (in $000's)'
有什么办法可以同时使用正则表达式吗?
答案 0 :(得分:2)
您可以捕获3个部分,即年,年和其余部分之前的字符串,然后合并第1组和第3组以获得其余部分:
table
.window(Tumble over windowLengthInMinutes.minutes on 'timeStamp as 'timeWindow)
.groupBy(..fieds list)
.select(..fields)
请参见Python demo。输出:
import re
strng ="Fiscal Year Ended March 31, 2018 Total Year (in $000's)"
m = re.search(r"(.*)\b(20[012]\d)\b(.*)",strng)
if m:
print("YEAR: {}".format(m.group(2)))
print("REST: {}{}".format(m.group(1),m.group(3)))
如果您的字符串有多个匹配项,请在模式中使用YEAR: 2018
REST: Fiscal Year Ended March 31, Total Year (in $000's)
:
re.split
您也可以使用import re
strng ="Fiscal Year Ended March 31, 2018 Total Year (in $000's) and Another Fiscal Year Ended May 31, 2019 Total Year (in $000's)"
print(re.findall(r"\b20[012]\d\b",strng))
# => ['2018', '2019']
print(" ".join(re.split(r"\b20[012]\d\b",strng)))
# => Fiscal Year Ended March 31, Total Year (in $000's) and Another Fiscal Year Ended May 31, Total Year (in $000's)
从开头/结尾空白中删除组。