Python Regex-使用re.sub清理字符串

时间:2018-08-23 15:21:35

标签: python regex

使用System.Runtime.GCSettings.LargeObjectHeapCompactionMode = System.Runtime.GCLargeObjectHeapCompactionMode.CompactOnce; 从字符串中删除数字时遇到一些问题。输入字符串如下所示:

regex sub

我要删除的是数字和单词"The Term' means 125 years commencing on and including 01 October 2015." "125 years commencing on 25th December 1996" "the term of 999 years from the 1st January 2011" -我也在使用'years'解析日期字符串,但是DateFinder将数字解释为日期-因此为什么我要删除号码。

DateFinder表达式是否有删除数字和单词regex的想法?

2 个答案:

答案 0 :(得分:0)

尝试此操作以删除数字和单词*** SyntaxError: unexpected EOF while parsing

years

例如:

re.sub(r'\s+\d+|\s+years', '', text)

然后输出将是:

text="The Term' means 125 years commencing on and including 01 October 2015."

答案 1 :(得分:0)

我认为这可以满足您的要求

import re

my_list = ["The Term' means 125 years commencing on and including 01 October 2015.",
"125 years commencing on 25th December 1996",
"the term of 999 years from the 1st January 2011",
]

for item in my_list:
    new_item = re.sub("\d+\syears", "", item)
    print(new_item)

结果:

The Term' means  commencing on and including 01 October 2015.
 commencing on 25th December 1996
the term of  from the 1st January 2011

注意,您最终会得到一些额外的空格(也许您想要)吗?但您也可以将其添加到“清理”中:

new_item = re.sub("\s+", " ", new_item)

因为我喜欢正则表达式:new_item = re.sub(“ ^ \ s + | \ s + $”,“”,new_item)

new_item = new_item.strip()