我是Python的新手。可以用regex来完成。我想在字符串中搜索特定的子字符串,并在字符串之前和之后删除字符。
示例1
Input:"This is the consignment no 1234578TP43789"
Output:"This is the consignment no TP"
示例2
Input:"Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
Output:"Consignment no TP is on its way on vehicle no MP"
我有要在字符串中搜索的这些首字母缩略词(MP
,TP
)的列表。
答案 0 :(得分:7)
您可以使用re.sub
>>> string="This is the consignment no 1234578TP43789"
>>> re.sub(r'\d+(TP|MP)\d+', r'\1', string)
'This is the consignment no TP'
>>> string="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
>>> re.sub(r'\d+(TP|MP)\d+', r'\1', string)
'Consignment no TP is on its way on vehicle no MP'
它的作用是什么?
\d+
匹配一个或多个数字。(TP|MP)
匹配TP
或MP
。在\1
中捕获它。我们使用这个捕获的字符串来替换整个匹配的字符串。如果在TP / MP之前和之后出现任何字符,我们可以使用\S
来匹配空格以外的任何字符。例如,
>>> string="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
>>> re.sub(r'\S+(TP|MP)\S+', r'\1', string)
'Consignment no TP is on its way on vehicle no MP'
修改强>
使用list comprehension,您可以遍历列表并将所有字符串替换为
>>> list_1=["TP","MP","DCT"]
>>> list_2=["This is the consignment no 1234578TP43789","Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"]
>>> [ re.sub(r'\d+(' + '|'.join(list_1) + ')\d+', r'\1', string) for string in list_2 ]
['This is the consignment no TP', 'Consignment no TP is on its way on vehicle no MP']
答案 1 :(得分:0)
您可以使用strip
来删除字符串前后的字符。
strg="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
strg=' '.join([word.strip('0123456789') for word in strg.split()])
print(strg) # Consignment no TP is on its way on vehicle no MP
如果包含保留字,则将其剥离放入循环
strg="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890 200DG"
reserved=['MP','TP']
for res in reserved:
strg=' '.join([word.strip('0123456789') if (res in word) else word for word in strg.split()])
print(strg) # Consignment no TP is on its way on vehicle no MP 200DG