Python正则表达式不起作用

时间:2017-05-19 01:56:11

标签: python regex parsing

我有以下原始文本输出,我需要提取选择性信息,但我在python中的正则表达式没有获取选择性信息。 我的字符串是:

label 123 start
    int
    some other random text
    exit
exit
label 576 start
    int
    some other random text
    exit
exit
label 888 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit
label 902 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit
label 456 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit

从上面的文字字符串中我想将以下项目作为单个项目

Item 1
label 888 start
    explanation jgfjgjgj 
    some random text 
    exit
up up
exit
Item 2
label 902 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit
Item 3
label 456 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit

我有以下正则表达式:

(label)\s\d{1,4}(.*?)(?=\s*explanation)(.*?)\s+up up

这也捕获了以下两个我不想要的项目:

label 123 start
    start
    some other random text
    exit
exit
label 576 start
    start
    some other random text
    exit
exit

我的构建是基于它对词语进行预测"解释"并且只捕获从标签开始并在“向上”处完成的项目。第一个项目它捕获了标签123和标签576的所有内容。我认为应该已经停止它,但是它会捕获它。

2 个答案:

答案 0 :(得分:0)

我假设您正在寻找的是一个节:

  • unindented 行开头,label后跟一个整数
  • 包含从explanation
  • 开始的缩进
  • 不包含任何其他未缩进的行,除非它以未缩进的up up后跟未缩进的exit终止。

这对应于正则表达式:

(?mx)^label[ \t]+\d{1,4}.*     # Unindented line starting label
     (?:\n[ \t]+.*)*?          # Some indented lines (non-greedy)
     (?:\n[ \t]+explanation.*) # Indented explanation
     (?:\n[ \t]+.*)*           # More indented lines
     \nup\ up\nexit\n          # Termination sequence including final newline

测试:

text="""label 123 start
    int
    some other random text
    exit
exit
label 576 start
    int
    some other random text
    exit
exit
label 888 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit
label 902 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit
label 456 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit
"""

r = r'''(?mx)
    ^label[ \t]+\d{1,4}.*     # Unindented line starting label
    (?:\n[ \t]+.*)*?          # Some indented lines (non-greedy)
    (?:\n[ \t]+explanation.*) # Indented explanation
    (?:\n[ \t]+.*)*           # More indented lines
    \nup\ up\nexit\n          # Termination sequence including final newline
'''

for i, m in enumerate(re.findall(r, text)):
    print("Item "+str(i)+"\n"+m)

Item 0
label 888 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit

Item 1
label 902 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit

Item 2
label 456 start
    explanation jgfjgjgj
    some random text 
    exit
up up
exit

答案 1 :(得分:0)

检查以下正则表达式 -

(label\s\d{1,4}\sstart(\s*explanation)(.*?)up\sup\s*exit)

它应该工作。点击此处查看demo