我有以下原始文本输出,我需要提取选择性信息,但我在python中的正则表达式没有获取选择性信息。 我的字符串是:
label 123 start
int
some other random text
exit
exit
label 576 start
int
some other random text
exit
exit
label 888 start
explanation jgfjgjgj
some random text
exit
up up
exit
label 902 start
explanation jgfjgjgj
some random text
exit
up up
exit
label 456 start
explanation jgfjgjgj
some random text
exit
up up
exit
从上面的文字字符串中我想将以下项目作为单个项目
Item 1
label 888 start
explanation jgfjgjgj
some random text
exit
up up
exit
Item 2
label 902 start
explanation jgfjgjgj
some random text
exit
up up
exit
Item 3
label 456 start
explanation jgfjgjgj
some random text
exit
up up
exit
我有以下正则表达式:
(label)\s\d{1,4}(.*?)(?=\s*explanation)(.*?)\s+up up
这也捕获了以下两个我不想要的项目:
label 123 start
start
some other random text
exit
exit
label 576 start
start
some other random text
exit
exit
我的构建是基于它对词语进行预测"解释"并且只捕获从标签开始并在“向上”处完成的项目。第一个项目它捕获了标签123和标签576的所有内容。我认为应该已经停止它,但是它会捕获它。
答案 0 :(得分:0)
我假设您正在寻找的是一个节:
label
后跟一个整数explanation
up up
后跟未缩进的exit
终止。这对应于正则表达式:
(?mx)^label[ \t]+\d{1,4}.* # Unindented line starting label
(?:\n[ \t]+.*)*? # Some indented lines (non-greedy)
(?:\n[ \t]+explanation.*) # Indented explanation
(?:\n[ \t]+.*)* # More indented lines
\nup\ up\nexit\n # Termination sequence including final newline
测试:
text="""label 123 start
int
some other random text
exit
exit
label 576 start
int
some other random text
exit
exit
label 888 start
explanation jgfjgjgj
some random text
exit
up up
exit
label 902 start
explanation jgfjgjgj
some random text
exit
up up
exit
label 456 start
explanation jgfjgjgj
some random text
exit
up up
exit
"""
r = r'''(?mx)
^label[ \t]+\d{1,4}.* # Unindented line starting label
(?:\n[ \t]+.*)*? # Some indented lines (non-greedy)
(?:\n[ \t]+explanation.*) # Indented explanation
(?:\n[ \t]+.*)* # More indented lines
\nup\ up\nexit\n # Termination sequence including final newline
'''
for i, m in enumerate(re.findall(r, text)):
print("Item "+str(i)+"\n"+m)
Item 0
label 888 start
explanation jgfjgjgj
some random text
exit
up up
exit
Item 1
label 902 start
explanation jgfjgjgj
some random text
exit
up up
exit
Item 2
label 456 start
explanation jgfjgjgj
some random text
exit
up up
exit
答案 1 :(得分:0)