使用递归和正则表达式将字符串转换为Python字典

时间:2017-01-25 19:01:03

标签: python regex dictionary

我需要将字符串转换为字典。更具体地说,我需要将Auditd消息解析为字典。 EX: 字符串:

msg=audit(123.123:123): pid=2514 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/sudo" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'

这里有一些替代方案:

msg=audit(1234902.147:88): pid=254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/grep" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'

msg=audit(432787023.324:77): pid=1254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/tail" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'

我想要的是:

{
  msg: 'audit(...',
  pid: ...,
  uid: ...,
  mess: {
    op: PAM...,
    grantors=pam_unix...
  }
}

我真的很想我的头脑我知道我需要一个正则表达式,它需要递归,但我非常感谢一些帮助。

2 个答案:

答案 0 :(得分:1)

你去(借助一些正则表达式):

import re

string = """
msg=audit(1234902.147:88): pid=254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/grep" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'

msg=audit(432787023.324:77): pid=1254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/tail" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'
"""

# lines regex
entries = re.compile(r'^msg=.+', re.MULTILINE)

# outer regex
rx = re.compile("""
    ((\w+)='([^']+)') # longer group
    |             # or
    (\w+=\S+)     # single items
    """, re.VERBOSE)

# inner regex
ry = re.compile("(\w+)=(\S+)")

for entry in entries.finditer(string):
  result = dict()
  for match in rx.finditer(entry.group(0)):
    try:
      key, value = match.group(4).split('=')
      result[key] = value
    except:
      #key = match.group(2)

      inner = dict()
      for m in ry.finditer(match.group(3)):
        inner[m.group(1)] = m.group(2)

      result["mess"] = inner

  print(result)

请参阅a demo on ideone.com

答案 1 :(得分:0)

这是一种可能性,但在制作过程中没有正则表达式被杀死:

import shlex
from collections import OrderedDict

def split_on_equals_to_dict(string_to_split):
    split_dict = OrderedDict()
    for i, item in enumerate(shlex.split(string_to_split)):
        number_of_equals = item.count('=')
        if number_of_equals == 0:
            split_dict[item] = None
        elif number_of_equals == 1:
            split_dict.update(dict([item.split('=')]))
        else:
            tag, value = tuple(item.split('=', 1))
            split_dict[tag] = split_on_equals_to_dict(value)
    return split_dict

log_str="""audit(123.123:123): pid=2514 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/sudo" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'"""
log_dict = split_on_equals_to_dict(log_str)

提供的字符串有些含糊不清。我通过使用OrderedDict处理了这个问题。