我需要获得未用斜角括号括起来的文本。
我的输入如下:
> whatever something<X="Y" zzz="abc">this is a foo bar <this is a
> < whatever>and i ><only want this
,所需的输出是:
> whatever something
this is a foo bar <this is a
>
and i ><only want this
我先尝试检测括号内的东西,然后将其删除。但似乎我匹配<>
内的属性而不是整个<...>
。我如何实现所需的输出?
import re
x = """whatever something<X="Y" zzz="abc">this is a foo bar <this is a\n< whatever>and i ><only want this"""
re.findall("<([^>]*)>", x.strip())
['X="Y" zzz="abc"', 'this is a\n ', ' whatever']
答案 0 :(得分:1)
您应该在正则表达式中移动括号内的括号(并删除您已有的括号)以获取<...>
之间的所有文本,包括括号本身。您还需要排除\n
个字符以获得所需的输出。
import re
x = """whatever something<X="Y" zzz="abc">this is a foo bar <this is a\n\
< whatever>and i ><only want this"""
y = re.findall("(<[^>\n]*>)",x.strip())
z = x[:]
for i in y:
z = z.replace(i,'\n')
print(z)
whatever something
this is a foo bar <this is a
and i ><only want this
括号表示findall
找到匹配项时要分组的文本。