Question

我在Python中有一个字符串：

Tt = "This is a <\"string\">string, It should be <\"changed\">changed to <\"a\">a nummber."

print Tt

'This is a <"string">string, It should be <"changed">changed to <"a">a nummber.'

您会在此部分中看到一些单词重复<\" \">.

我的问题是，如何删除那些重复的部分（用命名字符分隔）？

结果应该是：

'This is a string, It should be changed to a nummber.'

Answer 1

使用正则表达式：

import re
Tt = re.sub('<\".*?\">', '', Tt)

请注意?之后的*。它使表达非贪婪，所以它试图在<\"和\">之间匹配这么少的符号。

James 的解决方案仅适用于分隔子串的情况仅由一个字符（<和>）组成。在这种情况下，可以使用[^>]之类的否定。如果要删除用字符序列分隔的子字符串（例如，使用begin和end），则应使用非贪婪的正则表达式（即.*?）。

Answer 2

我会使用快速正则表达式：

import re
Tt = "This is a <\"string\">string, It should be <\"changed\">changed to <\"a\">a number."
print re.sub("<[^<]+>","",Tt)
#Out: This is a string, It should be changed to a nummber.

啊 - 类似于伊戈尔的帖子，他有点打败了我。如果表达式包含另一个开始标记“＆lt;”，则不会使表达式与非贪婪相匹配。在其中，所以它只匹配一个开始标记，后跟一个结束标记“＆gt;”。

如何从字符串中删除标有特殊字符的子字符串？

2 个答案: