Question

我需要获得未用斜角括号括起来的文本。

我的输入如下：

> whatever something<X="Y" zzz="abc">this is a foo bar <this is a
> < whatever>and i ><only want this

，所需的输出是：

> whatever something
this is a foo bar <this is a
> 
and i ><only want this

我先尝试检测括号内的东西，然后将其删除。但似乎我匹配<>内的属性而不是整个<...>。我如何实现所需的输出？

import re
x = """whatever something<X="Y" zzz="abc">this is a foo bar <this is a\n< whatever>and i ><only want this"""
re.findall("<([^>]*)>", x.strip())
['X="Y" zzz="abc"', 'this is a\n    ', ' whatever']

Answer 1

您应该在正则表达式中移动括号内的括号（并删除您已有的括号）以获取<...>之间的所有文本，包括括号本身。您还需要排除\n个字符以获得所需的输出。

import re
x =  """whatever something<X="Y" zzz="abc">this is a foo bar <this is a\n\
        < whatever>and i ><only want this"""
y = re.findall("(<[^>\n]*>)",x.strip())
z = x[:]
for i in y:
    z = z.replace(i,'\n')
print(z)
whatever something
this is a foo bar <this is a

and i ><only want this

括号表示findall找到匹配项时要分组的文本。

正则表达式的文本没有括在角度括号 - 蟒蛇

1 个答案: