Question

我是Python的新手。可以用regex来完成。我想在字符串中搜索特定的子字符串，并在字符串之前和之后删除字符。

示例1

Input:"This is the consignment no 1234578TP43789"
Output:"This is the consignment no TP"

示例2

Input:"Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
Output:"Consignment no TP is on its way on vehicle no MP"

我有要在字符串中搜索的这些首字母缩略词（MP，TP）的列表。

Answer 1

您可以使用re.sub

>>> string="This is the consignment no 1234578TP43789"
>>> re.sub(r'\d+(TP|MP)\d+', r'\1', string)
'This is the consignment no TP'

>>> string="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
>>> re.sub(r'\d+(TP|MP)\d+', r'\1', string)
'Consignment no TP is on its way on vehicle no MP'

它的作用是什么？

\d+匹配一个或多个数字。
(TP|MP)匹配TP或MP。在\1中捕获它。我们使用这个捕获的字符串来替换整个匹配的字符串。

如果在TP / MP之前和之后出现任何字符，我们可以使用\S来匹配空格以外的任何字符。例如，

>>> string="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
>>> re.sub(r'\S+(TP|MP)\S+', r'\1', string)
'Consignment no TP is on its way on vehicle no MP'

修改

使用list comprehension，您可以遍历列表并将所有字符串替换为

>>> list_1=["TP","MP","DCT"] >>> list_2=["This is the consignment no 1234578TP43789","Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"] >>> [ re.sub(r'\d+(' + '|'.join(list_1) + ')\d+', r'\1', string) for string in list_2 ] ['This is the consignment no TP', 'Consignment no TP is on its way on vehicle no MP']

Answer 2

您可以使用strip来删除字符串前后的字符。

strg="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
strg=' '.join([word.strip('0123456789') for word in strg.split()])
print(strg) # Consignment no TP is on its way on vehicle no MP

如果包含保留字，则将其剥离放入循环

strg="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890 200DG"
reserved=['MP','TP']
for res in reserved:
    strg=' '.join([word.strip('0123456789') if (res in word) else word for word in strg.split()])
print(strg) # Consignment no TP is on its way on vehicle no MP 200DG

在Python中删除字符串中特定子字符串之前和之后的字符

2 个答案: