描述

Question

我有多个带有LDAP条目的日志文件，我试图仅匹配某个日期期间具有createtimestamp的条目，但是捕获整个条目，而不仅仅是时间戳。参赛作品如下：

dn: ....
otherattr: 
...
createtimestamp: 20130621061525Z

问题是我得到的所有条目都在我想要的条目之前。

dn: ....
otherattr: 
...
createtimestamp: 20121221082545Z

dn: ....
otherattr: 
...
createtimestamp: 20130621061525Z

这是表达式：

dn_search = re.compile(r'dn: (.*?)createtimestamp: 20130[4-6]\d+?Z', flags=re.M|re.S)

我已尝试过其他一些表达式，但我要么只获取createtimestamp或不需要的条目。有什么想法吗？

Answer 1

描述

此正则表达式假设每组文本以dn:开头，并以空行结束。然后它将捕获整组线，并捕获createtimestamp字段的值

^dn:(?=(?:(?!^createtimestamp:|^dn:|^\s*(?:\r|\n\|$)|\Z).)*^createtimestamp:\s*([^\s\r\n]*))(?:(?!^dn:|^\s*(?:\r|\n\|$)|\Z).)*

enter image description here

Python代码示例

链接到工作示例http://repl.it/J0t

代码

import re

string = """dn: ....
otherattr: 
...
createtimestamp: 20121221082545Z_1

dn: ....
otherattr: 
...
createtimestamp: 20130621061525Z_2
""";

for matchObj in re.finditer( r'^dn:(?=(?:(?!^createtimestamp:|^dn:|^\s*(?:\r|\n\|$)|\Z).)*^createtimestamp:\s*([^\s\r\n]*))(?:(?!^dn:|^\s*(?:\r|\n\|$)|\Z).)*', string, re.M|re.I|re.S):
    print "-------"
    print "matchObj.group(1) : ", matchObj.group(1)

<强>返回

-------
matchObj.group(1) :  20121221082545Z_1
-------
matchObj.group(1) :  20130621061525Z_2

Answer 2

请勿尝试手动解析LDIF。它并不复杂，但是属性和名称转义以及长行的续行等都会让你感到困惑。使用the LDIF parser from python-ldap。

无法将LDAP条目与多行正则表达式匹配

2 个答案:

描述

Python代码示例