Question

我正在尝试使用正则表达式（import re）从日志文件中提取我想要的信息。

更新：添加了C:\WINDOWS\security文件夹权限，破坏了所有示例代码。

说日志的格式是：

C:\:
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    \Everyone   Allowed:    Read & Execute
    (No auditing)

C:\WINDOWS\system32:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Modify
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

C:\WINDOWS\system32\config:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Read & Execute
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

C:\WINDOWS\security:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Traverse Folder
            Read Attributes
            Read Permissions
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Traverse Folder
            Read Attributes
            Read Permissions
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

它重复了一些其他目录。如何将它们拆分为paragraphs，然后检查包含Special Permissions:的行？

像这样：

将整个string1分成几个部分C:\和C:\WINDOWS\system32。
查看包含“特殊权限：”
显示整行，例如： C:\: BUILTIN\Users Allowed: Special Permissions: \n\ Create Folders\n\ BUILTIN\Users Allowed: Special Permissions: \n\ Create Files\n\
重复下一个'段落'

我在考虑： 1.在整个文本文件中搜索r"(\w+:\\)(\w+\\?)*:" - 返回路径 2.字符串函数或正则表达式以获得剩余的输出 3.删除除Special Permissions之外的所有其他行 4.显示，然后重复步骤1

但我觉得效率不高。

任何人都可以指导我吗？感谢。

示例输出：

C:\:
BUILTIN\Users   Allowed:    Special Permissions:
Create Folders
BUILTIN\Users   Allowed:    Special Permissions:
Create Files

C:\WINDOWS\system32:
BUILTIN\Power Users Allowed:    Special Permissions: 
Delete

C:\WINDOWS\security:
BUILTIN\Users   Allowed:    Special Permissions: 
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users Allowed:    Special Permissions: 
Traverse Folder
Read Attributes
Read Permissions

C:\WINDOWS\system32\config没有出现，因为行中没有特别许可。

我正在使用的模板：

import re

text = ""

def main():
    f = open('DirectoryPermissions.xls', 'r')
    global text
    for line in f:
        text = text + line
    f.close
    print text

def regex():
    global text
    <insert code here>

if __name__ == '__main__':
    main()
    regex()

Answer 1

# I would replace this with reading lines from a file,
# rather than splitting a big string containing the file.

section = None
inspecialperm = False
with open("testdata.txt") as w:
    for line in w:
        if not line.startswith("            "):
            inspecialperm = False

        if section is None:
            section = line

        elif len(line) == 0:
            section = None

        elif 'Special Permissions' in line:
            if section:
                print section
                section = ""
            inspecialperm = True
            print line,

        elif inspecialperm:
            print line,

Answer 2

如果您通过“split＆amp; strip”解析字符串，则根本不需要re模块，这样效率更高：

for paragraph in string1.split('\n\n'):
    path = paragraph.split('\n', 1)[0].strip().rstrip(':')
    paragraph = paragraph.replace(': \n', ': ') # hack to have permissions in same line
    for line in paragraph.split('\n'):
        if 'Special Permissions: ' in line:
            permission = line.rsplit(':', 1)[-1].strip()
            print 'Path "%s" has special permission "%s"' % (path, permission)

将print语句替换为符合您需要的语句。

编辑：正如评论中指出的那样，之前的解决方案不适用于编辑过的问题中的新输入行，但是这里是如何修复它（比使用正则表达式更有效））：

for paragraph in string1.split('\n\n'):
    path = paragraph.split('\n', 1)[0].strip().rstrip(':')
    owner = None
    for line in paragraph.split('\n'):
        if owner is not None and ':' not in line:
            permission = line.rsplit(':', 1)[-1].strip()
            print 'Owner "%s" has special permission "%s" on path "%s"' % (owner, permission, path)
        else:
            owner = line.split(' Allowed:', 1)[0].strip() if line.endswith('Special Permissions: ') else None

Answer 3

与milkypostman的解决方案类似，但是您尝试将输出格式化为：

lines=string1.splitlines()
seperator = None
for index, line in enumerate(lines):
    if line == "":
        seperator = line
    elif "Special Permissions" in line:
        if seperator != None:
            print seperator
        print line.lstrip()
        offset=0
        while True:
            #if the line's last 2 characters are ": "
            if lines[index+offset][-2:]==": ":
                print lines[index+offset+1].lstrip()
                offset+=1
            else:
                break

Answer 4

以下是使用re模块和findall方法的解决方案。

data = '''\
C:\:
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control 
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    \Everyone   Allowed:    Read & Execute
    (No auditing)

C:\WINDOWS\system32:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Modify
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

C:\WINDOWS\system32\config:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Read & Execute
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)
'''

if __name__ == '__main__':
    import re

    # A regular expression to match a section "C:...."
    cre_par = re.compile(r'''
                ^C:.*?
                ^\s*$''', re.DOTALL | re.MULTILINE | re.VERBOSE)

    # A regular expression to match a "Special Permissions" line, and the
    # following line.
    cre_permissions = re.compile(r'''(^.*Special\ Permissions:\s*\n.*)\n''', 
                                re.MULTILINE | re.VERBOSE)

    # Create list of strings to output.
    out = []
    for t in cre_par.findall(data):
        out += [t[:t.find('\n')]] + cre_permissions.findall(data) + ['']

    # Join output list of strings together using end-of-line character
    print '\n'.join(out)

以下是生成的输出：

C:\:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete

C:\WINDOWS\system32:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete

C:\WINDOWS\system32\config:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete

Answer 5

感谢milkypostman，scoffey，其余的我想出了解决方案：

def regex():
    global text
    for paragraph in text.split('\n\n'):
        lines = paragraph.split('\n', 1)
        #personal modifier to choose certain output only
        if lines[0].startswith('C:\\:') or lines[0].startswith('C:\\WINDOWS\system32:') or lines[0].startswith('C:\\WINDOWS\\security:'):
            print lines[0]
            iterables = re.finditer(r".*Special Permissions: \n(\s+[a-zA-Z ]+\n)*", lines[1])
            for items in iterables:
                #cosmetic fix
                parsedText = re.sub(r"\n$", "", items.group(0))
                parsedText = re.sub(r"^\s+", "", parsedText)
                parsedText = re.sub(r"\n\s+", "\n", parsedText)
                print parsedText
            print

我仍然会查看所有发布的代码（特别是scoffey，因为我从来不知道纯粹的字符串操作是如此强大）。感谢您的见解！

当然，这不是最优的，但它适用于我的情况。如果您有任何建议，请随时发布。

输出：

C:\Python27>openfile.py
C:\:
BUILTIN\Users   Allowed:        Special Permissions:
Create Folders
BUILTIN\Users   Allowed:        Special Permissions:
Create Files

C:\WINDOWS\security:
BUILTIN\Users   Allowed:        Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users     Allowed:        Special Permissions:
Traverse Folder
Read Attributes
Read Permissions

C:\WINDOWS\system32:
BUILTIN\Power Users     Allowed:        Special Permissions:
Delete

用于多行检查的正则表达式

5 个答案: