我需要将字符串转换为字典。更具体地说,我需要将Auditd消息解析为字典。 EX: 字符串:
msg=audit(123.123:123): pid=2514 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12 30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/sudo" hostname=? a ddr=? terminal=/dev/pts/0 res=success'
这里有一些替代方案:
msg=audit(1234902.147:88): pid=254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12 30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/grep" hostname=? a ddr=? terminal=/dev/pts/0 res=success'
msg=audit(432787023.324:77): pid=1254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12 30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/tail" hostname=? a ddr=? terminal=/dev/pts/0 res=success'
我想要的是:
{
msg: 'audit(...',
pid: ...,
uid: ...,
mess: {
op: PAM...,
grantors=pam_unix...
}
}
我真的很想我的头脑我知道我需要一个正则表达式,它需要递归,但我非常感谢一些帮助。
答案 0 :(得分:1)
你去(借助一些正则表达式):
import re
string = """
msg=audit(1234902.147:88): pid=254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12 30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/grep" hostname=? a ddr=? terminal=/dev/pts/0 res=success'
msg=audit(432787023.324:77): pid=1254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12 30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/tail" hostname=? a ddr=? terminal=/dev/pts/0 res=success'
"""
# lines regex
entries = re.compile(r'^msg=.+', re.MULTILINE)
# outer regex
rx = re.compile("""
((\w+)='([^']+)') # longer group
| # or
(\w+=\S+) # single items
""", re.VERBOSE)
# inner regex
ry = re.compile("(\w+)=(\S+)")
for entry in entries.finditer(string):
result = dict()
for match in rx.finditer(entry.group(0)):
try:
key, value = match.group(4).split('=')
result[key] = value
except:
#key = match.group(2)
inner = dict()
for m in ry.finditer(match.group(3)):
inner[m.group(1)] = m.group(2)
result["mess"] = inner
print(result)
答案 1 :(得分:0)
这是一种可能性,但在制作过程中没有正则表达式被杀死:
import shlex
from collections import OrderedDict
def split_on_equals_to_dict(string_to_split):
split_dict = OrderedDict()
for i, item in enumerate(shlex.split(string_to_split)):
number_of_equals = item.count('=')
if number_of_equals == 0:
split_dict[item] = None
elif number_of_equals == 1:
split_dict.update(dict([item.split('=')]))
else:
tag, value = tuple(item.split('=', 1))
split_dict[tag] = split_on_equals_to_dict(value)
return split_dict
log_str="""audit(123.123:123): pid=2514 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12 30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/sudo" hostname=? a ddr=? terminal=/dev/pts/0 res=success'"""
log_dict = split_on_equals_to_dict(log_str)
提供的字符串有些含糊不清。我通过使用OrderedDict处理了这个问题。