我正在尝试使用正则表达式(import re
)从日志文件中提取我想要的信息。
更新:添加了C:\WINDOWS\security
文件夹权限,破坏了所有示例代码。
说日志的格式是:
C:\:
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
\Everyone Allowed: Read & Execute
(No auditing)
C:\WINDOWS\system32:
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Power Users Allowed: Modify
BUILTIN\Power Users Allowed: Special Permissions:
Delete
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
C:\WINDOWS\system32\config:
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Power Users Allowed: Read & Execute
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
C:\WINDOWS\security:
BUILTIN\Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
它重复了一些其他目录。如何将它们拆分为paragraphs
,然后检查包含Special Permissions:
的行?
像这样:
C:\
和C:\WINDOWS\system32
。C:\:
BUILTIN\Users Allowed: Special Permissions: \n\
Create Folders\n\
BUILTIN\Users Allowed: Special Permissions: \n\
Create Files\n\
我在考虑:
1.在整个文本文件中搜索r"(\w+:\\)(\w+\\?)*:"
- 返回路径
2.字符串函数或正则表达式以获得剩余的输出
3.删除除Special Permissions
之外的所有其他行
4.显示,然后重复步骤1
但我觉得效率不高。
任何人都可以指导我吗?感谢。
示例输出:
C:\:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
C:\WINDOWS\system32:
BUILTIN\Power Users Allowed: Special Permissions:
Delete
C:\WINDOWS\security:
BUILTIN\Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
C:\WINDOWS\system32\config
没有出现,因为行中没有特别许可。
我正在使用的模板:
import re
text = ""
def main():
f = open('DirectoryPermissions.xls', 'r')
global text
for line in f:
text = text + line
f.close
print text
def regex():
global text
<insert code here>
if __name__ == '__main__':
main()
regex()
答案 0 :(得分:2)
# I would replace this with reading lines from a file,
# rather than splitting a big string containing the file.
section = None
inspecialperm = False
with open("testdata.txt") as w:
for line in w:
if not line.startswith(" "):
inspecialperm = False
if section is None:
section = line
elif len(line) == 0:
section = None
elif 'Special Permissions' in line:
if section:
print section
section = ""
inspecialperm = True
print line,
elif inspecialperm:
print line,
答案 1 :(得分:1)
如果您通过“split&amp; strip”解析字符串,则根本不需要re
模块,这样效率更高:
for paragraph in string1.split('\n\n'):
path = paragraph.split('\n', 1)[0].strip().rstrip(':')
paragraph = paragraph.replace(': \n', ': ') # hack to have permissions in same line
for line in paragraph.split('\n'):
if 'Special Permissions: ' in line:
permission = line.rsplit(':', 1)[-1].strip()
print 'Path "%s" has special permission "%s"' % (path, permission)
将print
语句替换为符合您需要的语句。
编辑:正如评论中指出的那样,之前的解决方案不适用于编辑过的问题中的新输入行,但是这里是如何修复它(比使用正则表达式更有效) ):
for paragraph in string1.split('\n\n'):
path = paragraph.split('\n', 1)[0].strip().rstrip(':')
owner = None
for line in paragraph.split('\n'):
if owner is not None and ':' not in line:
permission = line.rsplit(':', 1)[-1].strip()
print 'Owner "%s" has special permission "%s" on path "%s"' % (owner, permission, path)
else:
owner = line.split(' Allowed:', 1)[0].strip() if line.endswith('Special Permissions: ') else None
答案 2 :(得分:1)
与milkypostman的解决方案类似,但是您尝试将输出格式化为:
lines=string1.splitlines()
seperator = None
for index, line in enumerate(lines):
if line == "":
seperator = line
elif "Special Permissions" in line:
if seperator != None:
print seperator
print line.lstrip()
offset=0
while True:
#if the line's last 2 characters are ": "
if lines[index+offset][-2:]==": ":
print lines[index+offset+1].lstrip()
offset+=1
else:
break
答案 3 :(得分:0)
以下是使用re
模块和findall
方法的解决方案。
data = '''\
C:\:
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
\Everyone Allowed: Read & Execute
(No auditing)
C:\WINDOWS\system32:
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Power Users Allowed: Modify
BUILTIN\Power Users Allowed: Special Permissions:
Delete
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
C:\WINDOWS\system32\config:
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Power Users Allowed: Read & Execute
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
'''
if __name__ == '__main__':
import re
# A regular expression to match a section "C:...."
cre_par = re.compile(r'''
^C:.*?
^\s*$''', re.DOTALL | re.MULTILINE | re.VERBOSE)
# A regular expression to match a "Special Permissions" line, and the
# following line.
cre_permissions = re.compile(r'''(^.*Special\ Permissions:\s*\n.*)\n''',
re.MULTILINE | re.VERBOSE)
# Create list of strings to output.
out = []
for t in cre_par.findall(data):
out += [t[:t.find('\n')]] + cre_permissions.findall(data) + ['']
# Join output list of strings together using end-of-line character
print '\n'.join(out)
以下是生成的输出:
C:\:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
BUILTIN\Power Users Allowed: Special Permissions:
Delete
C:\WINDOWS\system32:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
BUILTIN\Power Users Allowed: Special Permissions:
Delete
C:\WINDOWS\system32\config:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
BUILTIN\Power Users Allowed: Special Permissions:
Delete
答案 4 :(得分:0)
感谢milkypostman
,scoffey
,其余的我想出了解决方案:
def regex():
global text
for paragraph in text.split('\n\n'):
lines = paragraph.split('\n', 1)
#personal modifier to choose certain output only
if lines[0].startswith('C:\\:') or lines[0].startswith('C:\\WINDOWS\system32:') or lines[0].startswith('C:\\WINDOWS\\security:'):
print lines[0]
iterables = re.finditer(r".*Special Permissions: \n(\s+[a-zA-Z ]+\n)*", lines[1])
for items in iterables:
#cosmetic fix
parsedText = re.sub(r"\n$", "", items.group(0))
parsedText = re.sub(r"^\s+", "", parsedText)
parsedText = re.sub(r"\n\s+", "\n", parsedText)
print parsedText
print
我仍然会查看所有发布的代码(特别是scoffey,因为我从来不知道纯粹的字符串操作是如此强大)。感谢您的见解!
当然,这不是最优的,但它适用于我的情况。如果您有任何建议,请随时发布。
输出:
C:\Python27>openfile.py
C:\:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
C:\WINDOWS\security:
BUILTIN\Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
C:\WINDOWS\system32:
BUILTIN\Power Users Allowed: Special Permissions:
Delete