在Python中删除字符串中特定子字符串之前和之后的字符

时间:2016-11-16 16:20:07

标签: python regex regex-lookarounds

我是Python的新手。可以用regex来完成。我想在字符串中搜索特定的子字符串,并在字符串之前和之后删除字符。

示例1

Input:"This is the consignment no 1234578TP43789"
Output:"This is the consignment no TP"

示例2

Input:"Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
Output:"Consignment no TP is on its way on vehicle no MP"

我有要在字符串中搜索的这些首字母缩略词(MPTP)的列表。

2 个答案:

答案 0 :(得分:7)

您可以使用re.sub

>>> string="This is the consignment no 1234578TP43789"
>>> re.sub(r'\d+(TP|MP)\d+', r'\1', string)
'This is the consignment no TP'

>>> string="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
>>> re.sub(r'\d+(TP|MP)\d+', r'\1', string)
'Consignment no TP is on its way on vehicle no MP'

它的作用是什么?

  • \d+匹配一个或多个数字。
  • (TP|MP)匹配TPMP。在\1中捕获它。我们使用这个捕获的字符串来替换整个匹配的字符串。

如果在TP / MP之前和之后出现任何字符,我们可以使用\S来匹配空格以外的任何字符。例如,

>>> string="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
>>> re.sub(r'\S+(TP|MP)\S+', r'\1', string)
'Consignment no TP is on its way on vehicle no MP'

修改

使用list comprehension,您可以遍历列表并将所有字符串替换为

>>> list_1=["TP","MP","DCT"]
>>> list_2=["This is the consignment no 1234578TP43789","Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"]
>>> [ re.sub(r'\d+(' +  '|'.join(list_1) + ')\d+', r'\1', string) for string in list_2 ]
['This is the consignment no TP', 'Consignment no TP is on its way on vehicle no MP']

答案 1 :(得分:0)

您可以使用strip来删除字符串前后的字符。

strg="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
strg=' '.join([word.strip('0123456789') for word in strg.split()])
print(strg) # Consignment no TP is on its way on vehicle no MP

如果包含保留字,则将其剥离放入循环

strg="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890 200DG"
reserved=['MP','TP']
for res in reserved:
    strg=' '.join([word.strip('0123456789') if (res in word) else word for word in strg.split()])
print(strg) # Consignment no TP is on its way on vehicle no MP 200DG