正则表达式的文本没有括在角度括号 - 蟒蛇

时间:2013-10-15 10:14:01

标签: python regex string brackets

我需要获得未用斜角括号括起来的文本。

我的输入如下:

> whatever something<X="Y" zzz="abc">this is a foo bar <this is a
> < whatever>and i ><only want this

,所需的输出是:

> whatever something
this is a foo bar <this is a
> 
and i ><only want this

我先尝试检测括号内的东西,然后将其删除。但似乎我匹配<>内的属性而不是整个<...>。我如何实现所需的输出?

import re
x = """whatever something<X="Y" zzz="abc">this is a foo bar <this is a\n< whatever>and i ><only want this"""
re.findall("<([^>]*)>", x.strip())
['X="Y" zzz="abc"', 'this is a\n    ', ' whatever']

1 个答案:

答案 0 :(得分:1)

您应该在正则表达式中移动括号内的括号(并删除您已有的括号)以获取<...>之间的所有文本,包括括号本身。您还需要排除\n个字符以获得所需的输出。

import re
x =  """whatever something<X="Y" zzz="abc">this is a foo bar <this is a\n\
        < whatever>and i ><only want this"""
y = re.findall("(<[^>\n]*>)",x.strip())
z = x[:]
for i in y:
    z = z.replace(i,'\n')
print(z)
whatever something
this is a foo bar <this is a

and i ><only want this

括号表示findall找到匹配项时要分组的文本。