使用正则表达式查找是否存在子字符串,如果存在则从python中的主字符串中分离出来

时间:2019-06-19 11:55:32

标签: python regex python-3.x substring

我有一个如下字符串

strng ="Fiscal Year Ended March 31, 2018 Total Year (in $000's)"

如果上面的字符串包含年份子字符串(例如2014、2015等),则将“年份”子字符串与其余部分分开。

获取我正在使用的“年”

re.findall(r"\b20[012]\d\b",strng)

我怎么也可以得到其余的子串。  预期输出为

year_substring --> '2018'
rest --> 'Fiscal Year Ended March 31, Total Year (in $000's)'

有什么办法可以同时使用正则表达式吗?

1 个答案:

答案 0 :(得分:2)

您可以捕获3个部分,即年,年和其余部分之前的字符串,然后合并第1组和第3组以获得其余部分:

table
  .window(Tumble over windowLengthInMinutes.minutes on 'timeStamp as 'timeWindow)
  .groupBy(..fieds list)
  .select(..fields)

请参见Python demo。输出:

import re
strng ="Fiscal Year Ended March 31, 2018 Total Year (in $000's)"
m = re.search(r"(.*)\b(20[012]\d)\b(.*)",strng)
if m:
    print("YEAR: {}".format(m.group(2)))
    print("REST: {}{}".format(m.group(1),m.group(3)))

如果您的字符串有多个匹配项,请在模式中使用YEAR: 2018 REST: Fiscal Year Ended March 31, Total Year (in $000's)

re.split

请参见another Python demo

您也可以使用import re strng ="Fiscal Year Ended March 31, 2018 Total Year (in $000's) and Another Fiscal Year Ended May 31, 2019 Total Year (in $000's)" print(re.findall(r"\b20[012]\d\b",strng)) # => ['2018', '2019'] print(" ".join(re.split(r"\b20[012]\d\b",strng))) # => Fiscal Year Ended March 31, Total Year (in $000's) and Another Fiscal Year Ended May 31, Total Year (in $000's) 从开头/结尾空白中删除组。