Question

list_1=["TP","MP","TS"]

list_2=["RTS:Id The package is delivered to TEMPR13TS0002",
        "RTS:Id The package is delivered to TEMPS19TS0332"]

我正在尝试在list_2元素的子字符串中查找list_1的元素，并将其替换为如下：

对于TS，输出应该是

list_2=["RTS:Id The package is delivered to TEMPR13 TS",
        "RTS:Id The package is delivered to TEMPS19 TS"]

在TS左侧插入空格并删除其右侧的任何内容。

而不是这个，我输出为：

list_2=["R TS:Id The package is delivered to TEMPR13 TS",
        "R TS:Id The package is delivered to TEMPS19 TS"]

我遇到问题，因为它也会对RTS substring做同样的事情。我只想对长度大于10的子串执行操作。

我的列表理解+正则表达式如下：

  updated_list=[ re.sub(r'(' +  '|'.join(list_1) + ')\S+', 
                 r' \1', i)for i in list_2]

Answer 1

一种效率不高的解决方案：

import re
str = "RTS:Id The package is delivered to TEMPR13TS0002"
pattern = re.compile('\w{11,}')
print pattern.sub(lambda m:re.sub("TS.*", " TS", m.group(0)), str)

Answer 2

正则表达式不适合检查字符串长度。

您可以更改list_1以处理RTS:Id和13TS00之间的差异，或使用其他Python函数来搜索，检查和替换字符串。

即。仅当TS后跟一个数字时才匹配 list_1=["TP","MP","TS\d"]

仅在python中长度大于11的字符串中的子串中使用re.sub

2 个答案: