Question

我的文字有时会包含特定单词的开头[例如TRUNCATED]最后我要删除。例如：

 foo bar TRUNC
 bar foo TRUNCATED
 foo bar bar TRU
 foo
 foo bar bar bar TRUNCA

我如何通过正则表达式删除它 - 我认为必须有一个更好的方法：

 corrected = re.sub(r" (T|TR|TRU|TRUN|TRUNC|TRUNCA|TRUNCAT|TRUNCATE|TRUNCATED)$", "", original)

[n.b。如果它是相关的，截断位置是不一致的 - 即有时它从第20个字符截断，其他时间更长]。

Answer 1

您可以使用以下内容：

T(R(U(N(C(A(T(ED?)?)?)?)?)?)?)?

代码：

 corrected = re.sub(r" (T(R(U(N(C(A(T(ED?)?)?)?)?)?)?)?)$", "", original)

Answer 2

你为什么要在正则表达式中做？只是做

s = "foo TRU"
l = s.rsplit(" ", 1)   # gives "foo" as long as TRUwhatever is the last word
final = l[0] if len(l) == 2 and l[1].startswith("T") else s

通过正则表达式

2 个答案: